Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: 1a state.commit 75451 is out of range [78103, 78161] [recovered] #10834

Closed
nictuku opened this issue Nov 20, 2016 · 1 comment
Closed

Comments

@nictuku
Copy link

nictuku commented Nov 20, 2016

This node won't ever startup:

build: beta-20161006 @ 2016/10/06 16:30:12 (go1.7.1)
admin: http://scw-96cd6e:8080
sql: postgresql://root@scw-96cd6e:26257?sslmode=disable
logs: cockroach-data/logs
store[0]: path=cockroach-data
join[0]: scw-d0bfca,scw-324c38
panic: 1a state.commit 75451 is out of range [78103, 78161] [recovered]
panic: 1a state.commit 75451 is out of range [78103, 78161]

goroutine 66 [running]:
panic(0x158f820, 0xc420839710)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/cockroachdb/cockroach/util/stop.(*Stopper).Recover(0xc4203d2000)
/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:177 +0x6e
panic(0x158f820, 0xc420839710)
/usr/local/go/src/runtime/panic.go:458 +0x243
github.com/cockroachdb/cockroach/storage.(*raftLogger).Panicf(0xc42082bc00, 0x175e96a, 0x2b, 0xc42076e940, 0x4, 0x4)
/go/src/github.com/cockroachdb/cockroach/storage/raft.go:111 +0x107
github.com/coreos/etcd/raft.(*raft).loadState(0xc4207e1950, 0x201ba, 0x19, 0x126bb, 0x0, 0x0, 0x0)
/go/src/github.com/coreos/etcd/raft/raft.go:1091 +0x1db
github.com/coreos/etcd/raft.newRaft(0xc420a39320, 0xc420a39140)
/go/src/github.com/coreos/etcd/raft/raft.go:289 +0xc9b
github.com/coreos/etcd/raft.NewRawNode(0xc420a39320, 0x0, 0x0, 0x0, 0x6503e3, 0x5830f262, 0x3af79c09)
/go/src/github.com/coreos/etcd/raft/rawnode.go:79 +0x71
github.com/cockroachdb/cockroach/storage.(*Replica).withRaftGroupLocked(0xc420843180, 0x18a5f00, 0xc420a39820, 0x88ed9b, 0xc420843220)
/go/src/github.com/cockroachdb/cockroach/storage/replica.go:411 +0x2a4
github.com/cockroachdb/cockroach/storage.(*Replica).withRaftGroup(0xc420843180, 0xc420a39820, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/storage/replica.go:470 +0x92
github.com/cockroachdb/cockroach/storage.(*Store).processRaftRequest(0xc42015e840, 0x7f21e0ccc778, 0xc4206fa9c0, 0xc420c22000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/storage/store.go:2563 +0xbb8
github.com/cockroachdb/cockroach/storage.(*Store).processRequestQueue(0xc42015e840, 0x15f)
/go/src/github.com/cockroachdb/cockroach/storage/store.go:2751 +0x180
github.com/cockroachdb/cockroach/storage.(*raftScheduler).worker(0xc42040df80, 0xc4203d2000)
/go/src/github.com/cockroachdb/cockroach/storage/scheduler.go:204 +0x308
github.com/cockroachdb/cockroach/storage.(*raftScheduler).Start.func1()
/go/src/github.com/cockroachdb/cockroach/storage/scheduler.go:160 +0x33
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc4203d2000, 0xc4207d89c0)
/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:188 +0x7d
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:189 +0x66

Node that I recently upgraded a 2nd node:

initiating graceful shutdown of server
server drained and shutdown completed
build:      beta-20161103 @ 2016/11/03 15:03:54 (go1.7.3)
admin:      http://scw-d0bfca:8080
sql:        postgresql://root@scw-d0bfca:26257?sslmode=disable
logs:       cockroach-data/logs
store[0]:   path=cockroach-data
status:     restarted pre-existing node
clusterID:  {99a27c6f-20c7-47f0-b5ef-ec82e2ed7c40}
nodeID:     3

This shutdown is normal I believe.

The 3rd node is also running the 20161006 version (same as the corrupt node) but did not observe any data corruption.

sql:       postgresql://root@scw-324c38:26257?sslmode=disable
logs:      cockroach-data/logs
store[0]:  path=cockroach-data
join[0]:   scw-d0bfca,scw-96cd6e
proto: no encoder for Error string [GetProperties]
proto: no encoder for Code int [GetProperties]
initiating graceful shutdown of server
build:     beta-20161006 @ 2016/10/06 16:30:12 (go1.7.1)
admin:     http://scw-324c38:8080
sql:       postgresql://root@scw-324c38:26257?sslmode=disable
logs:      cockroach-data/logs
store[0]:  path=cockroach-data
join[0]:   scw-d0bfca,scw-96cd6e
proto: no encoder for Error string [GetProperties]
proto: no encoder for Code int [GetProperties]

(this is all normal to me)

I do not know right now if the corruption happened only after I upgraded the other node or not. My guess is that it's older than that and may have been triggered by an OOM event. My nodes exit and restart quite often because I intentionally give them very little RAM.

Sorry if this is a terse bug report, but I'm on holidays with the family right now. I can provide more logs later, including INFO logs. (the server is crash-looping so it's producing zillions of logs).

Right now I just wanted to get this tracked somewhere.

@bdarnell
Copy link
Contributor

This sounds like #9037, especially if you're seeing a lot of OOM restarts. We have a fix for that coming soon in #10690

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants