-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teach Nomad servers how to fall back to Consul. #1276
Conversation
@@ -63,8 +66,14 @@ func testServer(t *testing.T, cb func(*Config)) *Server { | |||
// Enable raft as leader if we have bootstrap on | |||
config.RaftConfig.StartAsLeader = !config.DevDisableBootstrap | |||
|
|||
shutdownCh := make(chan struct{}) | |||
consulSyncer, err := consul.NewSyncer(config.ConsulConfig, shutdownCh, log.New(os.Stderr, "", log.LstdFlags)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't have to make a new logger should be one in the config I believe
cdcde82
to
2589b1d
Compare
LGTM |
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest. Server: Search factor increased from 3 to 5 times the bootstrap_expect. This should allow for faster convergence in large environments (e.g. sub-5min for 10K Consul DCs).
Per discussion, we want to be aggressive about fanning out vs possibly fixating on only local DCs. With RPC forwarding in place, a random walk may be less optimal from a network latency perspective, but it is guaranteed to eventually result in a converged state because all DCs are candidates during the bootstrapping process.
It is perfectly viable for an admin to downsize a Nomad Server cluster down to 1, 2, or `num % 2 == 0` (however ill-advised such activities may be). And instead of using `bootstrap_expect`, use a timeout-based strategy. If the `bootstrapFn` hasn't observed a leader in 15s it will fall back to Consul and will poll every ~60s until it sees a leader.
2589b1d
to
a4cfb2b
Compare
consulServices, _, err := consulCatalog.Service(nomadServerServiceName, consul.ServiceTagSerf, consulOpts) | ||
if err != nil { | ||
s.logger.Printf("[WARN] server.consul: failed to query service %+q in Consul datacenter %+q: %v", nomadServerServiceName, dc, err) | ||
mErr.Errors = append(mErr.Errors, fmt.Errorf("unable to query service %q from Consul datacenter %q: %v", nomadServerServiceName, dc, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would define the error and then use it in the log and add it to mErr.
Also don't attempt to join the Server with itself.
@@ -501,6 +511,9 @@ func (s *Server) setupBootstrapHandler() error { | |||
if addr == "" { | |||
addr = cs.Address | |||
} | |||
if localNode.Addr.String() == addr && int(localNode.Port) == cs.ServicePort { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put a comment that we are skipping ourselves
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
It is now possible to omit
retry_join
from server configs if Consul is available.