Skip to content

Commit

Permalink
consul.passthru.tests: Fix failure on current consul. Fixes NixOS#90613.
Browse files Browse the repository at this point in the history
Done by setting `autopilot.min_quorum = 3`.

Techncially, this would have been required to keep the test correct since
Consul's "autopilot" "Dead Server Cleanup" was enabled by default (I believe
that was in Consul 0.8). Practically, the issue only occurred with our NixOS
test with releases >= `1.7.0-beta2` (see NixOS#90613). The setting itself is
available since Consul 1.6.2.

However, this setting was not documented clearly enough for anybody to notice,
and only the upstream issue hashicorp/consul#8118
I filed brought that to light.

As explained there, the test could also have been made pass by applying the
more correct rolling reboot procedure

    -m.wait_until_succeeds("[ $(consul members | grep -o alive | wc -l) == 5 ]")
    +m.wait_until_succeeds(
    +    "[ $(consul operator raft list-peers | grep true | wc -l) == 3 ]"
    +)

but we also intend to test that Consul can regain consensus even if
the quorum gets temporarily broken.
  • Loading branch information
nh2 committed Jun 18, 2020
1 parent 25d6656 commit a59a972
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions nixos/tests/consul.nix
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@ let
extraConfig = defaultExtraConfig // {
server = true;
bootstrap_expect = numConsensusServers;
# Tell Consul that we never intend to drop below this many servers.
# Ensures to not permanently lose consensus after temporary loss.
# See https://github.com/hashicorp/consul/issues/8118#issuecomment-645330040
autopilot.min_quorum = numConsensusServers;
retry_join =
# If there's only 1 node in the network, we allow self-join;
# otherwise, the node must not try to join itself, and join only the other servers.
Expand Down

0 comments on commit a59a972

Please sign in to comment.