Skip to content
This repository has been archived by the owner on Feb 9, 2024. It is now read-only.

(5.5) Better errors for teleport nodes pre-check #1874

Merged
merged 2 commits into from
Jul 16, 2020

Conversation

r0mant
Copy link
Contributor

@r0mant r0mant commented Jul 14, 2020

Description

Prior to deploying the agents on the cluster nodes (before upgrade or any other operation that needs agents, or using gravity agent deploy), we verify that all Teleport nodes are available. This PR updates scenario when some Teleport nodes are not available with better error messages and clearer instructions:

  • Detect if upgrading from 5.5.40 affected by the Teleport token issue and return an appropriate error message linking to KB.
  • In other cases, just a better message with more specific instructions on how to check Teleport.

Type of change

  • Internal change (not necessarily a bug fix or a new feature)

Linked tickets and other PRs

TODOs

  • Self-review the change
  • Perform manual testing
  • Address review feedback

Testing done

Prep

Install a cluster and shutdown Teleport on one of the nodes.

Upgrade from 5.5.40

ubuntu@node-1:~/upgrade$ sudo ./gravity upgrade --manual
Tue Jul 14 21:36:15 UTC	Upgrading cluster from 5.5.40 to 5.5.50-dev.2
Tue Jul 14 21:36:16 UTC	Deploying agents on cluster nodes
Tue Jul 14 21:36:16 UTC	Encountered error, will shutdown agents
Tue Jul 14 21:36:16 UTC	Shutting down the agents
[ERROR]: Teleport is unavailable on the following cluster nodes: node-2.

Gravity version 5.5.40 you're currently running has a known issue with Teleport
using an incorrect auth token on the joined nodes preventing Teleport nodes from
joining.

This cluster may be affected by this issue if new nodes were joined to it after
upgrade to 5.5.40. See the following KB article for remediation actions:

https://community.gravitational.com/t/recover-teleport-nodes-failing-to-join-due-to-bad-token/649

After fixing the issue, "./gravity status" can be used to confirm the status of
Teleport on each node in the "remote access" field.

Once all Teleport nodes have joined successfully, launch the upgrade again.

Upgrade from other version

ubuntu@node-1:~/upgrade$ sudo ./gravity upgrade --manual
Tue Jul 14 21:38:34 UTC	Upgrading cluster from 5.5.41 to 5.5.50-dev.2
Tue Jul 14 21:38:34 UTC	Deploying agents on cluster nodes
Tue Jul 14 21:38:34 UTC	Encountered error, will shutdown agents
Tue Jul 14 21:38:34 UTC	Shutting down the agents
[ERROR]: Teleport is unavailable on the following cluster nodes: node-2.

Please check the status and logs of Teleport systemd service on the specified
nodes and make sure it's running:

systemctl status gravity__gravitational.io__teleport__3.0.5
journalctl -u gravity__gravitational.io__teleport__3.0.5 --no-pager

After fixing the issue, "./gravity status" can be used to confirm the status of
Teleport on each node using "remote access" field.

Once all Teleport nodes are running, launch the upgrade again.

@r0mant r0mant requested review from bernardjkim and a team July 14, 2020 21:49
@r0mant r0mant self-assigned this Jul 14, 2020
lib/ops/opsservice/versions.go Outdated Show resolved Hide resolved
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants