Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: runtime error: index out of range (raft) #754

Closed
stuart-warren opened this issue Feb 26, 2019 · 3 comments
Closed

panic: runtime error: index out of range (raft) #754

stuart-warren opened this issue Feb 26, 2019 · 3 comments

Comments

@stuart-warren
Copy link

it seems that if the volume that raft is writing to is full, it may panic.

[1] 2019/02/26 13:59:29.236301 [INF] STREAM: Starting nats-streaming-server[example-nats-stream] version 0.12.0
[1] 2019/02/26 13:59:29.236604 [INF] STREAM: ServerID: qcayxlT0HDXOuvhggVU5OO
[1] 2019/02/26 13:59:29.236611 [INF] STREAM: Go version: go1.11.5
[1] 2019/02/26 13:59:29.267906 [INF] STREAM: Recovering the state...
[1] 2019/02/26 13:59:29.962513 [INF] STREAM: Recovered 2 channel(s)
[1] 2019/02/26 13:59:29.962568 [INF] STREAM: Cluster Node ID : example-nats-stream-2
[1] 2019/02/26 13:59:29.962571 [INF] STREAM: Cluster Log Path: /data/log
[1] 2019/02/26 13:59:30.532884 [DBG] STREAM: Loaded existing state for Raft group example-nats-stream
[1] 2019/02/26 13:59:30.533005 [DBG] STREAM: Discover subject:           _STAN.discover.example-nats-stream
[1] 2019/02/26 13:59:30.533011 [DBG] STREAM: Publish subject:            _STAN.pub.example-nats-stream.>
[1] 2019/02/26 13:59:30.533013 [DBG] STREAM: Subscribe subject:          _STAN.sub.example-nats-stream
[1] 2019/02/26 13:59:30.533016 [DBG] STREAM: Subscription Close subject: _STAN.subclose.example-nats-stream
[1] 2019/02/26 13:59:30.533018 [DBG] STREAM: Unsubscribe subject:        _STAN.unsub.example-nats-stream
[1] 2019/02/26 13:59:30.533020 [DBG] STREAM: Close subject:              _STAN.close.example-nats-stream
[1] 2019/02/26 13:59:30.533751 [INF] STREAM: Message store is RAFT_FILE
[1] 2019/02/26 13:59:30.533769 [INF] STREAM: Store location: /data/store
[1] 2019/02/26 13:59:30.533840 [INF] STREAM: ---------- Store Limits ----------
[1] 2019/02/26 13:59:30.533847 [INF] STREAM: Channels:                  100 *
[1] 2019/02/26 13:59:30.533850 [INF] STREAM: --------- Channels Limits --------
[1] 2019/02/26 13:59:30.533853 [INF] STREAM:   Subscriptions:          1000 *
[1] 2019/02/26 13:59:30.533855 [INF] STREAM:   Messages     :       1000000 *
[1] 2019/02/26 13:59:30.533858 [INF] STREAM:   Bytes        :     976.56 MB *
[1] 2019/02/26 13:59:30.533861 [INF] STREAM:   Age          :     unlimited *
[1] 2019/02/26 13:59:30.533863 [INF] STREAM:   Inactivity   :     unlimited *
[1] 2019/02/26 13:59:30.533886 [INF] STREAM: ----------------------------------
panic: runtime error: index out of range

goroutine 73 [running]:
github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt.(*DB).page(...)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt/db.go:880
github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt.(*Tx).rollback(0xc000e8a460)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt/tx.go:267 +0xe7
github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt.(*Tx).Commit(0xc000e8a460, 0xc0000277b8, 0x8)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/go.etcd.io/bbolt/tx.go:186 +0x500
github.com/nats-io/nats-streaming-server/server.(*raftLog).StoreLogs(0xc000132180, 0xc0002c4380, 0x10, 0x10, 0x175, 0x17984a)
	/go/src/github.com/nats-io/nats-streaming-server/server/raft_log.go:263 +0x22b
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*LogCache).StoreLogs(0xc0003001c0, 0xc0002c4380, 0x10, 0x10, 0xc0002c0300, 0x3d)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/log_cache.go:61 +0xec
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).appendEntries(0xc000158b00, 0xa23660, 0xc00006cba0, 0x0, 0x0, 0xc00006cb40, 0xc00006cba0)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/raft.go:1083 +0x930
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).processRPC(0xc000158b00, 0xa23660, 0xc00006cba0, 0x0, 0x0, 0xc00006cb40)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/raft.go:953 +0x22a
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).runFollower(0xc000158b00)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/raft.go:150 +0xe71
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).run(0xc000158b00)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/raft.go:132 +0x92
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*Raft).run-fm()
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/api.go:505 +0x2a
github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc000158b00, 0xc00012b620)
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/state.go:146 +0x53
created by github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft.(*raftState).goFunc
	/go/src/github.com/nats-io/nats-streaming-server/vendor/github.com/hashicorp/raft/state.go:144 +0x66

It took quite a while to confirm that the volume was full as there are no tools in the nats-streaming:0.12.0 image

If stopping is the correct thing, can we improve the error/log message?

@kozlovic
Copy link
Member

Thank you for the report, and glad to know that you figured that this was due to a volume full situation (I would probably not have figure this out).
As you can see the panic occurs deep down in the raft/boltdb code, so I am not sure how I will be able to intercept and alter the error message, but will think about it more.

@kozlovic
Copy link
Member

@stuart-warren I believe that the panic occurred because I started to use boltdb.NoFreelistSync. I have opened an issue with them: etcd-io/bbolt#152
I may revert the use of this flag for the 0.12.2 that I plan to release this week until the above issue is fixed.
In your situation where the volume was full, if boltdb did not panic, I believe that the error would have been properly reported. Will keep this issue opened for now.

@kozlovic
Copy link
Member

Closing for now since PR #766 may prevent the panic (and so you would have seen the reason for the write failure). That being said, it is always possible to get a panic if boltdb is unable to recover from a failure. I would recommend having a look at this comment: #769 (comment) for ideas on what to do to recover from a panic on node restart.
Closing for now. Thanks again for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants