Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raft state machine tweaks #4684

Merged
merged 5 commits into from
Oct 20, 2023
Merged

Raft state machine tweaks #4684

merged 5 commits into from
Oct 20, 2023

Conversation

neilalexander
Copy link
Member

@neilalexander neilalexander commented Oct 20, 2023

This PR does the following:

  1. Replaces state with an atomic, which can be accessed lock-free instead of needing to take the group mutex
  2. Ensures that runAsFollower, runAsLeader, runAsCandidate can only continue running as long as the state is correct (they will stop looping and hand back to run if not)
  3. Removes the isLeader atomic, as it is now unnecessary with the state atomic
  4. Adds a unit test to prove that we can now recover from the situation of all nodes being forced into a leaderless follower state, which now passes with the above changes

Signed-off-by: Neil Twigg [email protected]

@neilalexander neilalexander requested a review from a team as a code owner October 20, 2023 14:31
server/raft.go Show resolved Hide resolved
server/raft.go Show resolved Hide resolved
server/raft.go Show resolved Hide resolved
server/raft.go Show resolved Hide resolved
server/raft.go Show resolved Hide resolved
@derekcollison derekcollison self-requested a review October 20, 2023 15:07
@neilalexander
Copy link
Member Author

Currently investigating:

=== RUN   TestJetStreamClusterDetectOrphanNRGs
panic: close of closed channel
goroutine 72044 [running]:
github.com/nats-io/nats-server/v2/server.(*raft).shutdown(0xc014fc6300, 0x0)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/raft.go:1509 +0xb2
github.com/nats-io/nats-server/v2/server.(*raft).Stop(0xc012ca67a0?)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/raft.go:1494 +0x29
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorConsumer(0xc0043a9000, 0xc00024b900, 0xc00171da70)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:4603 +0x1533
github.com/nats-io/nats-server/v2/server.(*jetStream).processClusterCreateConsumer.func1()
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:4295 +0x47
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1()
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3700 +0x5a
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine in goroutine 71984
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3698 +0x1bb
FAIL	github.com/nats-io/nats-server/v2/server	370.200s
FAIL

Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit 0ee1e0e into main Oct 20, 2023
4 checks passed
@neilalexander neilalexander deleted the neil/raftnoleader branch October 20, 2023 16:29
neilalexander added a commit that referenced this pull request Nov 2, 2023
This PR backports the following into the v2.9.24 branch:

* #4684 
* #4725
* #4727

Signed-off-by: Neil Twigg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants