0.7.3->0.8.4 upgrade leads to protocol version (2) is incompatible: [1, 0] #3217

kamaradclimber · 2017-07-03T14:47:43Z

`consul version` for both Client and Server

Client:

most use Consul v0.7.3-criteo1-criteo1 (f3d518bc+CHANGES) Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
some already updated version 0.8.4 (same as servers)
very few has 0.8.5 (experiment done by one team)

Server: Consul v0.8.4 Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

The custom criteo version is based f3d518b and only contains patches from #2474 and #2657.

`consul info` for both Client and Server

Client:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 2
	services = 3
build:
	prerelease = criteo1
	revision = 'f3d518b
	version = 0.7.3
consul:
	known_servers = 0
	server = false
runtime:
	arch = amd64
	cpu_count = 24
	goroutines = 30
	max_procs = 2
	os = windows
	version = go1.7.4
serf_lan:
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Server:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 1
	services = 3
build:
	prerelease = 
	revision = f436077
	version = 0.8.4
consul:
	bootstrap = false
	known_datacenters = 9
	leader = false
	leader_addr = 10.71.4.141:8300
	server = true
raft:
	applied_index = 213985112
	commit_index = 213985112
	fsm_pending = 0
	last_contact = 12.873464ms
	last_log_index = 213985113
	last_log_term = 5511
	last_snapshot_index = 213979994
	last_snapshot_term = 5511
	latest_configuration = [{Suffrage:Voter ID:10.71.4.143:8300 Address:10.71.4.143:8300} {Suffrage:Voter ID:10.71.4.142:8300 Address:10.71.4.142:8300} {Suffrage:Voter ID:10.71.4.141:8300 Address:10.71.4.141:8300}]
	latest_configuration_index = 129059017
	num_peers = 2
	protocol_version = 2
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 5511
runtime:
	arch = amd64
	cpu_count = 24
	goroutines = 8348
	max_procs = 23
	os = linux
	version = go1.8.3
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1872
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 10
	member_time = 1095554
	members = 1872
	query_queue = 0
	query_time = 370
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 66250
	members = 27
	query_queue = 0
	query_time = 1

Operating system and Environment details

50% servers are linux (centos7 mostly), 50% are windows server 2012r2.

Description of the Issue (and unexpected/desired result)

consul agent already present in the serf cluster had upgraded without issues.
new consul agent added during the upgrade cannot join the cluster.
Here is the logs from an agent (on windows):

2017/07/03 14:35:46 [WARN] manager: No servers available
2017/07/03 14:35:46 [ERR] agent: failed to sync remote state: No known Consul servers
2017/07/03 14:35:50 [INFO] agent: (LAN) joining: [consul02-ty5.central.criteo.prod consul01-ty5.central.criteo.prod consul03-ty5.central.criteo.prod]
2017/07/03 14:35:50 [DEBUG] memberlist: TCP-first lookup failed for 'consul02-ty5.central.criteo.prod:8301', falling back to UDP: open /etc/resolv.conf: The system cannot find the path specified.
2017/07/03 14:35:50 [DEBUG] memberlist: Initiating push/pull sync with: 10.71.4.141:8301
2017/07/03 14:35:50 [DEBUG] memberlist: Failed to join 10.71.4.141: Node 'web-fbx019-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0]
2017/07/03 14:35:50 [DEBUG] memberlist: TCP-first lookup failed for 'consul01-ty5.central.criteo.prod:8301', falling back to UDP: open /etc/resolv.conf: The system cannot find the path specified.
2017/07/03 14:35:50 [DEBUG] memberlist: Initiating push/pull sync with: 10.71.4.142:8301
2017/07/03 14:35:50 [DEBUG] memberlist: Failed to join 10.71.4.142: Node 'hostw145-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0]
2017/07/03 14:35:50 [DEBUG] memberlist: TCP-first lookup failed for 'consul03-ty5.central.criteo.prod:8301', falling back to UDP: open /etc/resolv.conf: The system cannot find the path specified.
2017/07/03 14:35:50 [DEBUG] memberlist: Initiating push/pull sync with: 10.71.4.143:8301
2017/07/03 14:35:50 [DEBUG] memberlist: Failed to join 10.71.4.143: Node 'couchs01e23-ty5.storage.criteo.prod' protocol version (2) is incompatible: [1, 0]
2017/07/03 14:35:50 [INFO] agent: (LAN) joined: 0 Err: 3 error(s) occurred:

* Failed to join 10.71.4.141: Node 'web-fbx019-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0]
* Failed to join 10.71.4.142: Node 'hostw145-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0]
* Failed to join 10.71.4.143: Node 'couchs01e23-ty5.storage.criteo.prod' protocol version (2) is incompatible: [1, 0]
2017/07/03 14:35:50 [WARN] agent: Join failed: <nil>, retrying in 30s
2017/07/03 14:35:50 [WARN] manager: No servers available

Seems weird for many reasons:

"Failed to join xxx" messages mention consul servers ip address but unrelated nodes fqdn
all agents I've tried claim to support protocol 2-3 (I'm not absolutely sure it serf protocol version or raft since it's not mentioned anywhere and both protocols have similar version at the moment).
node should be able to join during an upgrade (that will last ~1week)

The text was updated successfully, but these errors were encountered:

slackpad · 2017-07-03T15:04:45Z

Hi @kamaradclimber these errors are related to the Serf protocol:

* Failed to join 10.71.4.141: Node 'web-fbx019-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0]
* Failed to join 10.71.4.142: Node 'hostw145-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0]
* Failed to join 10.71.4.143: Node 'couchs01e23-ty5.storage.criteo.prod' protocol version (2) is incompatible: [1, 0]

Consul 0.7 dropped support for protocol version 1, it that hasn't been used since Consul 0.3. What version of Consul is running on web-fbx019-ty5.ty5, hostw145-ty5.ty5, and couchs01e23-ty5?

kamaradclimber · 2017-07-03T18:54:16Z

according to consul members, those nodes are running 0.7.3-criteo1:

couchs01e23-ty5.storage.criteo.prod             10.211.136.240:8301  alive   client  0.7.3criteo1  2         ty5
hostw145-ty5.ty5.ad.criteo.prod                 10.211.161.238:8301  alive   client  0.7.3criteo1  2         ty5
web-fbx019-ty5.ty5.ad.criteo.prod               10.211.165.91:8301   alive   client  0.7.3criteo1  2         ty5

here is a summary of versions in that consul cluster:

   1692 0.7.3criteo1
     70 0.8.4
    110 0.8.5

kamaradclimber · 2017-07-05T07:21:49Z

We finally solved the issue by restarting a consul server that looks weird.

This server has been detected because many agents in the serf cluster were not seeing it as part of the serf cluster.

For instance, some agents were in one case:

consul members|grep server
consul01-ty5.central.criteo.prod                10.71.4.142:8301     alive    server  0.8.4         2         ty5
consul02-ty5.central.criteo.prod                10.71.4.141:8301     alive    server  0.8.4         2         ty5

while others (including the faulty server, consul03-ty5.central.criteo.prod):

consul01-ty5.central.criteo.prod                10.71.4.142:8301     alive    server  0.8.4         2         ty5
consul02-ty5.central.criteo.prod                10.71.4.141:8301     alive    server  0.8.4         2         ty5
consul03-ty5.central.criteo.prod                10.71.4.143:8301     alive    server  0.8.4         2         ty5

Just display information about how min/max protocols are computed Change-Id: I91d264ac90c7f37cbbb006a0efdcf012bdfe8b37

kamaradclimber · 2017-07-05T07:24:40Z

Using code from criteo-forks@4e58f83 during the incident, we could see:

Node 'mems12e06-ty5.storage.criteo.prod' put constraint on maxpmin to 1
Node 'mems12e06-ty5.storage.criteo.prod' put constraint on minpmax to 5
Node 'mems12e06-ty5.storage.criteo.prod' put constraint on maxdmin to 2
Node 'mems12e06-ty5.storage.criteo.prod' put constraint on mindmax to 4
Node 'mesos-slave035-ty5.central.criteo.prod' put constraint on minpmax to 0
Node 'mesos-slave035-ty5.central.criteo.prod' put constraint on mindmax to 0
End of iteration on remote nodes
Node 'web-rtb032-ty5.ty5.ad.criteo.prod' put constraint on maxpmin to 1
Node 'web-rtb032-ty5.ty5.ad.criteo.prod' put constraint on minpmax to 5
Node 'web-rtb032-ty5.ty5.ad.criteo.prod' put constraint on maxdmin to 2
Node 'web-rtb032-ty5.ty5.ad.criteo.prod' put constraint on mindmax to 4
Node 'mesos-slave035-ty5.central.criteo.prod' put constraint on minpmax to 0
Node 'mesos-slave035-ty5.central.criteo.prod' put constraint on mindmax to 0
End of iteration on remote nodes
Node 'web-fbx024-ty5.ty5.ad.criteo.prod' put constraint on maxpmin to 1
Node 'web-fbx024-ty5.ty5.ad.criteo.prod' put constraint on minpmax to 5
Node 'web-fbx024-ty5.ty5.ad.criteo.prod' put constraint on maxdmin to 2
Node 'web-fbx024-ty5.ty5.ad.criteo.prod' put constraint on mindmax to 4
Node 'mesos-slave035-ty5.central.criteo.prod' put constraint on minpmax to 0
Node 'mesos-slave035-ty5.central.criteo.prod' put constraint on mindmax to 0
End of iteration on remote nodes
    2017/07/04 18:59:22 [INFO] agent: (LAN) joined: 0 Err: 3 error(s) occurred:

* Failed to join 10.71.4.141: Node 'mems12e06-ty5.storage.criteo.prod' protocol version (2) is incompatible: [1, 0]
* Failed to join 10.71.4.142: Node 'web-rtb032-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0]
* Failed to join 10.71.4.143: Node 'web-fbx024-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0]
    2017/07/04 18:59:22 [WARN] agent: Join failed: <nil>, retrying in 30s

I think the weird part can be seen when minpmax is set to 0.

kamaradclimber · 2017-07-05T07:25:29Z

Anyway incident is closed on our side, I'd be happy to contribute to a discussion to detect such scenario better.

slackpad · 2017-08-23T00:22:27Z

Thanks for the follow up note @kamaradclimber - not sure how things got into that state though, and I've never seen something like this before. It's odd that the server was related to the issue but the weird version in the log is associated with mesos-slave035-ty5.central.criteo.prod. Am I missing something about mesos-slave035-ty5.central.criteo.prod?

kamusin · 2017-11-27T22:56:16Z

We are facing a similar issue in our environment which we haven't been able to fix so far. At the moment we have 462 nodes running consul, where from a 462 universe, 375 are 0.9.x - 61 are 0.8.x and finally 10 0.7.x agents. Consul servers are running 0.9.x, respectively.

Some of the messages we have seen so far are (which look very similar each other and may have the same root cause):

Error1

2017/11/27 17:33:54 [ERR] memberlist: Push/Pull with i-asdasdasd failed: Node 'i-asdasdsda' protocol version (2) is incompatible: [1, 0]

Error2

 2017/11/27 17:33:42 [ERR] memberlist: Failed push/pull merge: Node 'i-asdasdsadd' protocol version (2) is incompatible: [1, 0] from=10.x.x.x:60109

we have been wondering how gets those values [1, 0], as far as we know the default protocol version spoken within the cluster is version 2, which is set in the source code as the minimum version as well.

Feel free to reach me if you guys need more information :)

kamusin · 2018-01-03T00:21:25Z

thanks @kamaradclimber for the patch provided we were able to identify a node that was causing the issue. pretty much the output we got was:

./consul agent -config-dir /etc/consul
==> Starting Consul agent...
==> Joining cluster...
Node 'i-000000000' put constraint on maxpmin to 1
Node 'i-000000000' put constraint on minpmax to 5
Node 'i-000000000' put constraint on maxdmin to 2
Node 'i-000000000' put constraint on mindmax to 5
Node 'i-555555555' put constraint on minpmax to 4
Node 'i-555555555' put constraint on mindmax to 4
End of iteration on remote nodes
==> 1 error(s) occurred:

* Failed to join 10.51.229.135: Node 'i-abcdfe' protocol version (0) is incompatible: [1, 4]

after forcing the removal of the node from the cluster and wiped out the data directory we were able to re-join the failing nodes.

jarro2783 · 2018-01-03T00:49:54Z

I work with @kamusin, we saw this again yesterday and used the patch above to debug. There was a node that allowed inbound traffic on 8301, but not outbound, so somehow it became part of the cluster at one point, but subsequently it couldn't initiate connections to anything. Eventually it seemed like the server thought it's maximum supported protocol version was 0, and nothing new could join the cluster.

We resolved it by stopping consul on the node and then doing a force-leave.

ianwestcott · 2018-01-04T04:20:40Z

I just experienced this same bug on a Consul cluster with 6 servers and 800+ agents, all running Consul v0.9.3. This was during normal operation of the cluster, not an upgrade as described in the original post. But as in earlier comments, new nodes could not join the cluster, and existing nodes were logging the following error at high frequency:

[ERR] memberlist: Failed push/pull merge: Node 'i-xxxxxxxxxxxxxxx' protocol version (2) is incompatible: [1, 0]

Oddly, each instance of the above error referenced a random node, seemingly indicating that a large number of nodes were faulty, which was not actually the case.

Using @kamaradclimber's patch, I found what turned out to be a single culprit, despite the numerous errors. After forcing the bad node out of the cluster and wiping its data directory, it rejoined fine. Other nodes were then able to join without issue as well, and the protocol version error messages ceased.

Another difference from the original post is that in my case the culprit was an agent, not a server.

slackpad · 2018-01-04T04:23:11Z

Tagging this as a bug (which will likely end up fixed in memberlist or Serf). It looks like there's some case where we can poison the version checking algorithm with some zero-valued node entries in member list.

mauriziod · 2018-03-28T11:57:48Z

This is also happening on version 0.9.3, using same version on server and clients.

pierresouchay · 2018-06-28T19:08:44Z

Same error while migrating from Consul 1.0.6 with patches to 1.1.0 with patches (but nothing related to serf protocol)

kamaradclimber · 2018-06-29T06:59:25Z

Would someone from hashicorp have any idea regarding the casue of this issue?

mkeeler · 2018-06-29T13:06:56Z

@kamaradclimber I just started to dive into the memberlist code yesterday afternoon and am going to continue some more today. Unfortunately I don't have much to report yet.

Am I correct in thinking this only happens during upgrades? If so how do you go about performing them? (just kill Consul and restart with a newer version?)

pierresouchay · 2018-06-29T13:28:50Z

@mkeeler Yes, it seems it happens only when upgrading clients.

It probably means that auto-negotiation does not work well.

Basically we:

upgraded Consul Servers to 1.1.0 a few weeks ago
upgraded some of the Consul Agents to 1.10 a few weeks ago (few hours after servers)

At this point all those already migrated servers had been upgraded

Yesterday, we upgraded some of the remaining clients still in 1.0.6, but during the upgrade, some of the agents could not join the cluster with this message.

Our Cluster contains ONLY 1.0.6 and 1.1.0 clients (and only 1.1.0 servers)

After several hours of investigation, it seems the only way is to restart sequentially all Consul Servers, then the new agents can join properly the cluster.

mkeeler · 2018-06-29T18:18:28Z

@pierresouchay Are you doing Gossip Encryption? It shouldn't matter, just trying to rule things out.

pierresouchay · 2018-06-29T20:21:05Z

Yes.

Note that in that case, some of the nodes in error see each other's (but no server in the list)

pearkes · 2018-10-26T18:57:45Z

Related issue: #4342

ianwestcott · 2018-11-21T21:33:56Z

Update: I experienced this issue again recently on the same Consul cluster as previously reported, which has been running Consul v1.0.2 for a while now. As with the last time this occurred for us, it was not during a Consul upgrade. However, we've noticed that the problem correlates to operations wherein a large number of agents are restarted, meaning those agents have to leave the cluster and re-join it. I gather the protocol negotiation is a normal part of restarting a cluster, and that if a corrupt node somewhere in the cluster is reporting a bad protocol version, that negotiation cannot succeed. The result in our case is that agents cannot rejoin the cluster after being restarted.

Again, the only method I'm aware of for finding and removing the corrupt node from the cluster is to run a patched version of Consul with criteo-forks@4e58f83 included, and use the output from its own failed negotiation attempt to determine the bad node. Without that patch, the logs are not clear on which node is the corrupt one. Perhaps a good place to start in determining the root cause of this bug may be to add more precise logging to the memberlist and/or Serf code that can more clearly identify the culprit?

One more note about this most recent occurrence: Judging by the cluster logs, the corrupt node went bad on a Friday afternoon (coincident to an operation that restarted a large number of agents), but the problem wasn't discovered until a subsequent restart-triggering operation was performed the following Monday. I noticed that it took a long time for the cluster to recover after the corrupt node was forcibly removed – around 15 minutes, which is much longer than it took in previous occurrences. I don't know enough about Consul's internals to speculate as to what that could mean, but it seemed worth mentioning.

pierresouchay · 2018-11-21T21:38:28Z

@ianwestcott Yes, we confirm this exact behavior

We often have this when restarting massively agents. Sometimes, a simple restart is enough, but sometimes, the only reliable way to fix it is to restart sequentially all Consul servers.

pierresouchay · 2019-01-22T14:48:47Z

Here is more context in a new incident we had:

Jan 21 08:36:44 consul03-ty5 consul[444]: 2019/01/21 08:36:44 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.232.22.21:38828
Jan 21 08:36:51 consul03-ty5 consul[444]: 2019/01/21 08:36:51 [INFO] serf: EventMemberUpdate: mesos-slave068-ty5.central.criteo.prod
Jan 21 08:36:57 consul03-ty5 consul[444]: 2019/01/21 08:36:57 [INFO] serf: EventMemberUpdate: mesos-slave048-ty5.central.criteo.prod
Jan 21 08:36:57 consul03-ty5 consul[444]: 2019/01/21 08:36:57 [INFO] serf: EventMemberUpdate: mesos-slave026-ty5.central.criteo.prod
Jan 21 08:36:58 consul03-ty5 consul[444]: 2019/01/21 08:36:58 [ERR] memberlist: Failed push/pull merge: Node 'mesos-slave026-ty5.central.criteo.prod' protocol version (0) is incompatible: [1, 5] from=10.211.13
Jan 21 08:37:15 consul03-ty5 consul[444]: 2019/01/21 08:37:15 [ERR] memberlist: Push/Pull with d8-c4-97-b5-1d-e8.inventory.criteo.prod failed: Node 'web-cat039-ty5.ty5.ad.criteo.prod' protocol version (2) is i
Jan 21 08:37:20 consul03-ty5 consul[444]: 2019/01/21 08:37:20 [ERR] memberlist: Failed push/pull merge: Node 'mems19e08-ty5.storage.criteo.prod' protocol version (2) is incompatible: [1, 0] from=10.211.130.2:3
Jan 21 08:37:24 consul03-ty5 consul[444]: 2019/01/21 08:37:24 [INFO] serf: EventMemberUpdate: mesos-slave110-ty5.central.criteo.prod
Jan 21 08:37:30 consul03-ty5 consul[444]: 2019/01/21 08:37:30 [INFO] serf: EventMemberUpdate: mesos-slave037-ty5.central.criteo.prod
Jan 21 08:37:58 consul03-ty5 consul[444]: 2019/01/21 08:37:58 [ERR] memberlist: Failed push/pull merge: Node 'web-rtb336-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0] from=10.211.130.2:3
Jan 21 08:38:48 consul03-ty5 consul[444]: 2019/01/21 08:38:48 [ERR] memberlist: Failed push/pull merge: Node 'hostw842-ty5.ty5.ad.criteo.prod' protocol version (2) is incompatible: [1, 0] from=10.232.14.26:421
Jan 21 08:38:51 consul03-ty5 consul[444]: 2019/01/21 08:38:51 [INFO] serf: EventMemberFailed: d8-c4-97-71-a3-99.inventory.criteo.prod 10.232.64.28
Jan 21 08:38:51 consul03-ty5 consul[444]: 2019/01/21 08:38:51 [INFO] serf: EventMemberJoin: d8-c4-97-71-a3-99.inventory.criteo.prod 10.232.64.28
Jan 21 08:38:51 consul03-ty5 consul[444]: 2019/01/21 08:38:51 [ERR] memberlist: Failed push/pull merge: Node 'mesos-slave049-ty5.central.criteo.prod' protocol version (2) is incompatible: [1, 0] from=10.211.13

We can see clearly that before

Jan 21 08:36:58 consul03-ty5 consul[444]: 2019/01/21 08:36:58 [ERR] memberlist: Failed push/pull merge: Node 'mesos-slave026-ty5.central.criteo.prod' protocol version (0) is incompatible: [1, 5] from=10.211.13

No problem was seen, but the line after do a:

Jan 21 08:37:20 consul03-ty5 consul[444]: 2019/01/21 08:37:20 [ERR] memberlist: Failed push/pull merge: Node 'mems19e08-ty5.storage.criteo.prod' protocol version (2) is incompatible: [1, 0] from=10.211.130.2:3

and this message will keep repeating...

In order to fix, I had to restart all the servers sequentially.

@mkeeler What it means is that a single agent DID change the range of acceptable protocols, then it change the acceptable range of protocols on servers side. Quite a good start to find the root cause, what do you think? We are gonna try to provide a PR if we find the error that causes this corruption.

On Consul, sometimes, nodes do send a pMin = pMan = 0 in Vsn This causes a corruption of the acceptable versions of protocol and thus requiring version = [0, 1]. After this corruption occurs, all new nodes cannot join anymore, it then force the restart of all Consul servers to resume normal operations. While not fixing the root cause, this patch discards alive nodes claiming version 0,0,0 and will avoid this breakage. See hashicorp/consul#3217

* Avoid to take into account wrong versions of protocols in Vsn. On Consul, sometimes, nodes do send a pMin = pMan = 0 in Vsn This causes a corruption of the acceptable versions of protocol and thus requiring version = [0, 1]. After this corruption occurs, all new nodes cannot join anymore, it then force the restart of all Consul servers to resume normal operations. While not fixing the root cause, this patch discards alive nodes claiming version 0,0,0 and will avoid this breakage. See hashicorp/consul#3217 * Always set the Vsn when creating state, so race condition cannot happen * Do not move m.encodeBroadcastNotify(a.Node, aliveMsg, a, notify) since not needed * Test the bare minimum for size of Vsn Co-Authored-By: pierresouchay <[email protected]> * Fixed test TestMemberList_ProbeNode_Awareness_OldProtocol * Avoid to crash when len(Vsn) is incorrect and ignore the message when there is an Alive delegate

mkeeler · 2019-02-04T18:06:13Z

Now that the PR got merged into memberlist we will now need to revendor to pull in the changes.

…ible: [1, 0] This is fixed in hashicorp/memberlist#178, bump memberlist to fix possible split brain in Consul.

pierresouchay · 2019-02-04T20:50:28Z

@mkeeler PR in Consul: #5313

…ible: [1, 0] This is fixed in hashicorp/memberlist#178, bump memberlist to fix possible split brain in Consul.

kamaradclimber added a commit to criteo-forks/consul that referenced this issue Jul 5, 2017

[DEBUG] Debug issue hashicorp#3217

4e58f83

Just display information about how min/max protocols are computed Change-Id: I91d264ac90c7f37cbbb006a0efdcf012bdfe8b37

slackpad added this to the 1.0.2 milestone Nov 27, 2017

slackpad modified the milestones: 1.0.2, 1.0.3 Dec 13, 2017

slackpad added type/bug Feature does not function as expected theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization labels Jan 4, 2018

slackpad modified the milestones: 1.0.3, Next Jan 13, 2018

pierresouchay mentioned this issue Jul 13, 2018

Allow changing Node names since Node now have IDs #3983

Merged

pierresouchay mentioned this issue Sep 18, 2018

protocol version (0) is incompatible for a "left" server #4342

Closed

pierresouchay mentioned this issue Nov 16, 2018

Node protocol version is incompatible #4967

Closed

banks added the needs-investigation The issue described is detailed and complex. label Nov 16, 2018

banks added needs-discussion Topic needs discussion with the larger Consul maintainers before committing to for a release and removed needs-discussion Topic needs discussion with the larger Consul maintainers before committing to for a release labels Nov 26, 2018

pierresouchay mentioned this issue Jan 24, 2019

Avoid to take into account wrong versions of protocols in Vsn. hashicorp/memberlist#178

Merged

ShimmerGlass mentioned this issue Jan 25, 2019

Propagation of bad protocol versions causes split brain hashicorp/memberlist#180

Closed

pearkes modified the milestones: Upcoming, 1.4.3 Feb 4, 2019

pierresouchay added a commit to pierresouchay/consul that referenced this issue Feb 4, 2019

Fix hashicorp#3217: upgrade leads to protocol version (2) is incompat…

cf8c7a8

…ible: [1, 0] This is fixed in hashicorp/memberlist#178, bump memberlist to fix possible split brain in Consul.

pierresouchay mentioned this issue Feb 4, 2019

Fix #3217: upgrade leads to protocol version (2) is incompatible: [1, 0] #5313

Merged

pierresouchay added a commit to criteo-forks/consul that referenced this issue Feb 5, 2019

Fix hashicorp#3217: upgrade leads to protocol version (2) is incompat…

f22d6c1

…ible: [1, 0] This is fixed in hashicorp/memberlist#178, bump memberlist to fix possible split brain in Consul.

mkeeler closed this as completed in bfcfcc0 Feb 5, 2019

ShimmerGlass pushed a commit to criteo-forks/consul that referenced this issue Feb 8, 2019

Fix hashicorp#3217: upgrade leads to protocol version (2) is incompat…

65c8a45

…ible: [1, 0] This is fixed in hashicorp/memberlist#178, bump memberlist to fix possible split brain in Consul.

ShimmerGlass pushed a commit to criteo-forks/consul that referenced this issue Feb 8, 2019

Fix hashicorp#3217: upgrade leads to protocol version (2) is incompat…

83fc54c

…ible: [1, 0] This is fixed in hashicorp/memberlist#178, bump memberlist to fix possible split brain in Consul.

LeoCavaille pushed a commit to DataDog/consul that referenced this issue Mar 30, 2019

Fix hashicorp#3217: upgrade leads to protocol version (2) is incompat…

1e67f41

…ible: [1, 0] This is fixed in hashicorp/memberlist#178, bump memberlist to fix possible split brain in Consul.

ChipV223 mentioned this issue Jun 7, 2019

[ERR] memberlist: Failed push/pull merge: Node ' protocol version (2) is incompatible: [1, 0] #5936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.7.3->0.8.4 upgrade leads to protocol version (2) is incompatible: [1, 0] #3217

0.7.3->0.8.4 upgrade leads to protocol version (2) is incompatible: [1, 0] #3217

kamaradclimber commented Jul 3, 2017 •

edited

Loading

slackpad commented Jul 3, 2017

kamaradclimber commented Jul 3, 2017 •

edited

Loading

kamaradclimber commented Jul 5, 2017

kamaradclimber commented Jul 5, 2017

kamaradclimber commented Jul 5, 2017

slackpad commented Aug 23, 2017

kamusin commented Nov 27, 2017 •

edited

Loading

kamusin commented Jan 3, 2018

jarro2783 commented Jan 3, 2018

ianwestcott commented Jan 4, 2018

slackpad commented Jan 4, 2018

mauriziod commented Mar 28, 2018

pierresouchay commented Jun 28, 2018

kamaradclimber commented Jun 29, 2018

mkeeler commented Jun 29, 2018

pierresouchay commented Jun 29, 2018

mkeeler commented Jun 29, 2018

pierresouchay commented Jun 29, 2018

pearkes commented Oct 26, 2018

ianwestcott commented Nov 21, 2018

pierresouchay commented Nov 21, 2018

pierresouchay commented Jan 22, 2019

mkeeler commented Feb 4, 2019

pierresouchay commented Feb 4, 2019

0.7.3->0.8.4 upgrade leads to protocol version (2) is incompatible: [1, 0] #3217

0.7.3->0.8.4 upgrade leads to protocol version (2) is incompatible: [1, 0] #3217

Comments

kamaradclimber commented Jul 3, 2017 • edited Loading

consul version for both Client and Server

consul info for both Client and Server

Operating system and Environment details

Description of the Issue (and unexpected/desired result)

slackpad commented Jul 3, 2017

kamaradclimber commented Jul 3, 2017 • edited Loading

kamaradclimber commented Jul 5, 2017

kamaradclimber commented Jul 5, 2017

kamaradclimber commented Jul 5, 2017

slackpad commented Aug 23, 2017

kamusin commented Nov 27, 2017 • edited Loading

kamusin commented Jan 3, 2018

jarro2783 commented Jan 3, 2018

ianwestcott commented Jan 4, 2018

slackpad commented Jan 4, 2018

mauriziod commented Mar 28, 2018

pierresouchay commented Jun 28, 2018

kamaradclimber commented Jun 29, 2018

mkeeler commented Jun 29, 2018

pierresouchay commented Jun 29, 2018

mkeeler commented Jun 29, 2018

pierresouchay commented Jun 29, 2018

pearkes commented Oct 26, 2018

ianwestcott commented Nov 21, 2018

pierresouchay commented Nov 21, 2018

pierresouchay commented Jan 22, 2019

mkeeler commented Feb 4, 2019

pierresouchay commented Feb 4, 2019

kamaradclimber commented Jul 3, 2017 •

edited

Loading

`consul version` for both Client and Server

`consul info` for both Client and Server

kamaradclimber commented Jul 3, 2017 •

edited

Loading

kamusin commented Nov 27, 2017 •

edited

Loading