Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RabbitMQ fails on upgrade when 2 nodes are specified that are not clustered. #1979

Closed
przemyslavic opened this issue Jan 18, 2021 · 3 comments
Assignees
Labels
Milestone

Comments

@przemyslavic
Copy link
Collaborator

przemyslavic commented Jan 18, 2021

Describe the bug
There is an issue with RabbitMQ upgrade process failing during upgrade to v3.8.9.
The failing configuration is at least 2 non clustered nodes.

How to reproduce
Steps to reproduce the behavior:

  1. Deploy a 0.8 cluster with RabbitMQ component enabled (al least 2 vms)
  2. Upgrade the cluster to the develop branch.

Expected behavior
The cluster has been upgraded successfully.

Config files

---
kind: epiphany-cluster
name: default
provider: <provider>
specification:
  components:
    rabbitmq:
      count: 2
---        
kind: configuration/rabbitmq
title: "RabbitMQ"
provider: <provider>
name: default
specification:
  rabbitmq_plugins:
    - rabbitmq_management_agent
    - rabbitmq_management
  cluster:
    is_clustered: false

Environment

  • Cloud provider: [all]
  • OS: [all]

Additional context
The upgrade process fails on TASK [upgrade : RabbitMQ | Join a node to the cluster].

2021-01-17T00:53:40.4385876Z[38;21m00:53:40 INFO cli.engine.ansible.AnsibleCommand - TASK [upgrade : RabbitMQ | Join a node to the cluster] *************************[0m
2021-01-17T00:53:40.7005828Z[38;21m00:53:40 INFO cli.engine.ansible.AnsibleCommand - skipping: [ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com][0m
2021-01-17T00:53:42.1028562Z[31;21m00:53:42 ERROR cli.engine.ansible.AnsibleCommand - fatal: [ec2-yy-yy-yy-yy.eu-west-3.compute.amazonaws.com]: FAILED! => {"changed": true, "cmd": ["rabbitmqctl", "join_cluster", "rabbit@ec2-xx-xx-xx-xx"], "delta": "0:00:00.541081", "end": "2021-01-17 00:53:41.997904", "msg": "non-zero return code", "rc": 69, "start": "2021-01-17 00:53:41.456823", "stderr": "Error: unable to perform an operation on node '[email protected]'. Please see diagnostics information and suggestions below.\n\nMost common reasons for this are:\n\n * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)\n * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)\n * Target node is not running\n\nIn addition to the diagnostics info below:\n\n * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more\n * Consult server logs on node [email protected]\n * If target node is configured to use long node names, don't forget to use --longnames with CLI tools\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['[email protected]']\n\[email protected]:\n  * connected to epmd (port 4369) on ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com\n  * epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic \n  * TCP connection succeeded but Erlang distribution failed \n\n  * Authentication failed (rejected by the remote node), please check the Erlang cookie\n\n\nCurrent node details:\n * node name: 'rabbitmqcli-2076-rabbit@ec2-yy-yy-yy-yy.eu-west-3.compute.amazonaws.com'\n * effective user's home directory: /var/lib/rabbitmq\n * Erlang cookie hash: <hash>==", "stderr_lines": ["Error: unable to perform an operation on node '[email protected]'. Please see diagnostics information and suggestions below.", "", "Most common reasons for this are:", "", " * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)", " * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)", " * Target node is not running", "", "In addition to the diagnostics info below:", "", " * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more", " * Consult server logs on node [email protected]", " * If target node is configured to use long node names, don't forget to use --longnames with CLI tools", "", "DIAGNOSTICS", "===========", "", "attempted to contact: ['[email protected]']", "", "[email protected]:", "  * connected to epmd (port 4369) on ec2-xx-xx-xx-xx.eu-west-3.compute.amazonaws.com", "  * epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic ", "  * TCP connection succeeded but Erlang distribution failed ", "", "  * Authentication failed (rejected by the remote node), please check the Erlang cookie", "", "", "Current node details:", " * node name: 'rabbitmqcli-2076-rabbit@ec2-yy-yy-yy-yy.eu-west-3.compute.amazonaws.com'", " * effective user's home directory: /var/lib/rabbitmq", " * Erlang cookie hash: <hash>=="], "stdout": "Clustering node [email protected] with rabbit@ec2-xx-xx-xx-xx", "stdout_lines": ["Clustering node [email protected] with rabbit@ec2-xx-xx-xx-xx"]}

The command rabbitmqctl join_cluster is run even though we are not creating a cluster. We need to check the specification and value of the is clustered parameter before running some tasks.

@atsikham
Copy link
Contributor

Verified that the option with no clustering is not checked in the upgrade role. Previously added this task as a dependency for backporting - #1933, #1934, #1935.

@atsikham atsikham self-assigned this Jan 19, 2021
@atsikham atsikham removed their assignment Jan 19, 2021
to-bar added a commit to to-bar/epiphany that referenced this issue Jan 19, 2021
to-bar added a commit that referenced this issue Jan 19, 2021
* Deprecate Elasticsearch OSS v6

* Add #1979 to known issues
to-bar added a commit that referenced this issue Jan 19, 2021
* Deprecate Elasticsearch OSS v6

* Add #1979 to known issues
@atsikham atsikham assigned atsikham and unassigned atsikham Jan 20, 2021
@przemyslavic przemyslavic self-assigned this Jan 26, 2021
@przemyslavic
Copy link
Collaborator Author

Tested together with #1984.
✅ upgrade from 3.7.10 to 3.8.9
✅ upgrade from 3.8.3 to 3.8.9
✅ 1 RabbitMQ node
✅ 2 RabbitMQ nodes
✅ 2 RabbitMQ nodes clustered
Azure/AWS x Ubuntu/RHEL

@atsikham
Copy link
Contributor

PR #1989

@mkyc mkyc closed this as completed Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants