[BUG] RabbitMQ fails on upgrade when 2 nodes are specified that are not clustered. #1979

przemyslavic · 2021-01-18T08:31:44Z

Describe the bug
There is an issue with RabbitMQ upgrade process failing during upgrade to v3.8.9.
The failing configuration is at least 2 non clustered nodes.

How to reproduce
Steps to reproduce the behavior:

Deploy a 0.8 cluster with RabbitMQ component enabled (al least 2 vms)
Upgrade the cluster to the develop branch.

Expected behavior
The cluster has been upgraded successfully.

Config files

---
kind: epiphany-cluster
name: default
provider: <provider>
specification:
  components:
    rabbitmq:
      count: 2

---        
kind: configuration/rabbitmq
title: "RabbitMQ"
provider: <provider>
name: default
specification:
  rabbitmq_plugins:
    - rabbitmq_management_agent
    - rabbitmq_management
  cluster:
    is_clustered: false

Environment

Cloud provider: [all]
OS: [all]

Additional context
The upgrade process fails on TASK [upgrade : RabbitMQ | Join a node to the cluster].

2021-01-17T00:53:40.4385876Z[38;21m00:53:40 INFO cli.engine.ansible.AnsibleCommand - TASK [upgrade : RabbitMQ | Join a node to the cluster] *************************[0m
2021-01-17T00:53:40.7005828Z[38;21m00:53:40 INFO cli.engine.ansible.AnsibleCommand - skipping: [ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com][0m
2021-01-17T00:53:42.1028562Z[31;21m00:53:42 ERROR cli.engine.ansible.AnsibleCommand - fatal: [ec2-yy-yy-yy-yy.eu-west-3.compute.amazonaws.com]: FAILED! => {"changed": true, "cmd": ["rabbitmqctl", "join_cluster", "rabbit@ec2-xx-xx-xx-xx"], "delta": "0:00:00.541081", "end": "2021-01-17 00:53:41.997904", "msg": "non-zero return code", "rc": 69, "start": "2021-01-17 00:53:41.456823", "stderr": "Error: unable to perform an operation on node '[email protected]'. Please see diagnostics information and suggestions below.\n\nMost common reasons for this are:\n\n * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)\n * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)\n * Target node is not running\n\nIn addition to the diagnostics info below:\n\n * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more\n * Consult server logs on node [email protected]\n * If target node is configured to use long node names, don't forget to use --longnames with CLI tools\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['[email protected]']\n\[email protected]:\n  * connected to epmd (port 4369) on ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com\n  * epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic \n  * TCP connection succeeded but Erlang distribution failed \n\n  * Authentication failed (rejected by the remote node), please check the Erlang cookie\n\n\nCurrent node details:\n * node name: 'rabbitmqcli-2076-rabbit@ec2-yy-yy-yy-yy.eu-west-3.compute.amazonaws.com'\n * effective user's home directory: /var/lib/rabbitmq\n * Erlang cookie hash: <hash>==", "stderr_lines": ["Error: unable to perform an operation on node '[email protected]'. Please see diagnostics information and suggestions below.", "", "Most common reasons for this are:", "", " * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)", " * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)", " * Target node is not running", "", "In addition to the diagnostics info below:", "", " * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more", " * Consult server logs on node [email protected]", " * If target node is configured to use long node names, don't forget to use --longnames with CLI tools", "", "DIAGNOSTICS", "===========", "", "attempted to contact: ['[email protected]']", "", "[email protected]:", "  * connected to epmd (port 4369) on ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com", "  * epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic ", "  * TCP connection succeeded but Erlang distribution failed ", "", "  * Authentication failed (rejected by the remote node), please check the Erlang cookie", "", "", "Current node details:", " * node name: 'rabbitmqcli-2076-rabbit@ec2-yy-yy-yy-yy.eu-west-3.compute.amazonaws.com'", " * effective user's home directory: /var/lib/rabbitmq", " * Erlang cookie hash: <hash>=="], "stdout": "Clustering node [email protected] with rabbit@ec2-xx-xx-xx-xx", "stdout_lines": ["Clustering node [email protected] with rabbit@ec2-xx-xx-xx-xx"]}

The command rabbitmqctl join_cluster is run even though we are not creating a cluster. We need to check the specification and value of the is clustered parameter before running some tasks.

The text was updated successfully, but these errors were encountered:

atsikham · 2021-01-19T09:10:26Z

Verified that the option with no clustering is not checked in the upgrade role. Previously added this task as a dependency for backporting - #1933, #1934, #1935.

* Deprecate Elasticsearch OSS v6 * Add #1979 to known issues

przemyslavic · 2021-01-26T08:41:21Z

Tested together with #1984.
✅ upgrade from 3.7.10 to 3.8.9
✅ upgrade from 3.8.3 to 3.8.9
✅ 1 RabbitMQ node
✅ 2 RabbitMQ nodes
✅ 2 RabbitMQ nodes clustered
✅ Azure/AWS x Ubuntu/RHEL

atsikham · 2021-01-26T08:58:12Z

PR #1989

przemyslavic added type/bug area/rabbit status/grooming-needed priority/high Task with high priority labels Jan 18, 2021

przemyslavic added this to the S20210128 milestone Jan 18, 2021

atsikham mentioned this issue Jan 19, 2021

[Backport to v0.8] Upgrade rabbitmq to v3.8.9 #1933

Closed

atsikham self-assigned this Jan 19, 2021

atsikham removed the status/grooming-needed label Jan 19, 2021

atsikham removed their assignment Jan 19, 2021

to-bar added a commit to to-bar/epiphany that referenced this issue Jan 19, 2021

Add hitachienergy#1979 to known issues

e4cede3

to-bar added a commit that referenced this issue Jan 19, 2021

Add note about deprecation of Elasticsearch v6 (#1982)

e53b395

* Deprecate Elasticsearch OSS v6 * Add #1979 to known issues

to-bar added a commit that referenced this issue Jan 19, 2021

Add note about deprecation of Elasticsearch v6 (#1982)

23d74b6

* Deprecate Elasticsearch OSS v6 * Add #1979 to known issues

atsikham assigned atsikham and unassigned atsikham Jan 20, 2021

atsikham mentioned this issue Jan 21, 2021

Fixed rabbitmq upgrade #1989

Merged

przemyslavic self-assigned this Jan 26, 2021

przemyslavic mentioned this issue Jan 26, 2021

[BUG] RabbitMQ 3.7.10 fails on upgrade to 3.8.9: 'rabbitmqctl version' command not found. #1984

Closed

mkyc closed this as completed Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] RabbitMQ fails on upgrade when 2 nodes are specified that are not clustered. #1979

[BUG] RabbitMQ fails on upgrade when 2 nodes are specified that are not clustered. #1979

przemyslavic commented Jan 18, 2021 •

edited

Loading

atsikham commented Jan 19, 2021

przemyslavic commented Jan 26, 2021

atsikham commented Jan 26, 2021

[BUG] RabbitMQ fails on upgrade when 2 nodes are specified that are not clustered. #1979

[BUG] RabbitMQ fails on upgrade when 2 nodes are specified that are not clustered. #1979

Comments

przemyslavic commented Jan 18, 2021 • edited Loading

atsikham commented Jan 19, 2021

przemyslavic commented Jan 26, 2021

atsikham commented Jan 26, 2021

przemyslavic commented Jan 18, 2021 •

edited

Loading