Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport #12074 to branch/v8 #12432

Merged
merged 1 commit into from
May 6, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 114 additions & 25 deletions docs/pages/setup/admin/troubleshooting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,43 +2,67 @@
title: Troubleshooting
description: Troubleshooting and Collecting Metrics of Teleport Processes
---
<Notice scope={["cloud"]} type="tip">

These instructions apply to `teleport` processes running on your own
infrastructure in order to access specific resources. In Teleport Cloud, the
Auth and Proxy Services are monitored and managed for you.
In this guide, we will explain how to address issues or unexpected behavior in your
Teleport cluster.

</Notice>
You can use these steps to get more visibility into the `teleport` process so
you can troubleshoot <ScopedBlock scope={["oss", "enterprise"]}>the Auth
Service, Proxy Service, and </ScopedBlock> resource-specific services such as
the Application Service and Database Service.

## Prerequisites

(!docs/pages/includes/edition-prereqs-tabs.mdx!)

<ScopedBlock scope="cloud">
- A host where you have installed and configured the `teleport` binary.
</ScopedBlock>

(!docs/pages/includes/tctl.mdx!)

## Step 1/3. Enable verbose logging

## Troubleshooting
To diagnose problems, you can configure the `teleport` process to run with
verbose logging enabled by passing it the `-d` flag. `teleport` will write logs
to stderr.

To diagnose problems you can configure the `teleport` process to run with
verbose logging enabled by passing it the `-d` flag.
Logs will resemble the following (these logs were printed while joining a Node
to a cluster, then terminating the `teleport` process on the Node):

```
DEBU [NODE:PROX] Agent connected to proxy: [aee1241f-0f6f-460e-8149-23c38709e46d.tele.example.com aee1241f-0f6f-460e-8149-23c38709e46d teleport-proxy-us-west-2-6db8db844c-ftmg9.tele.example.com teleport-proxy-us-west-2-6db8db844c-ftmg9 localhost 127.0.0.1 ::1 tele.example.com 100.92.90.42 remote.kube.proxy.teleport.cluster.local]. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:414
DEBU [NODE:PROX] Changing state connecting -> connected. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:210
DEBU [NODE:PROX] Discovery request channel opened: teleport-discovery. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:526
DEBU [NODE:PROX] handleDiscovery requests channel. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:544
DEBU [NODE:PROX] Pool is closing agent. leaseID:2 target:tele.example.com:11106 reversetunnel/agentpool.go:238
DEBU [NODE:PROX] Pool is closing agent. leaseID:3 target:tele.example.com:11106 reversetunnel/agentpool.go:238
```

Debug logs include the file and line number of the code that emitted the log, so
you can investigate (or report) what a `teleport` process was doing before it ran into
problems.

<Notice
type="warning"
>
It is not recommended to run Teleport in production with verbose logging as it generates a substantial amount of data.
</Notice>
It is not recommended to run Teleport in production with verbose logging as it
generates a substantial amount of data.

Sometimes you may want to reset [`teleport`](../reference/cli.mdx#teleport) to a
clean state. This can be accomplished by erasing everything under the `data_dir`
directory, which defaults to `/var/lib/teleport/`.
</Notice>

## Debug dump
## Step 2/3. Generate a debug dump

You can get a goroutine dump of a running `teleport` process by sending it
`USR1` signal.
The `teleport` binary is a Go program. Go programs assign work to CPU threads
using an abstraction called a **goroutine**. You can get a goroutine dump of a
running `teleport` process by sending it a `USR1` signal.

Locate a running `teleport` daemon PID:
This is especially useful for troubleshooting a `teleport` process that appears
stuck, since you can see which a goroutine is blocked and and why. For example,
goroutines often communicate using **channels**, and a goroutine dump indicates
whether a goroutine is waiting to send or receive on a channel.

```code
# Locate teleport process PID
$ pidof teleport
235119
```

Send a `USR1` signal to a `teleport` process:
To generate a goroutine dump, send a `USR1` signal to a `teleport` process:

```code
$ kill -USR1 $(pidof teleport)
Expand All @@ -50,6 +74,11 @@ the logs:
```txt
INFO [PROC:1] Got signal "user defined signal 1", logging diagnostic info to stderr. service/signals.go:99
Runtime stats
goroutines: 64
OS threads: 10
GOMAXPROCS: 2
num CPU: 2
...
goroutines: 84
...
Goroutines
Expand All @@ -59,7 +88,58 @@ runtime/pprof.writeGoroutineStacks(0x3c2ffc0, 0xc0001a8010, 0xc001011a38, 0x4bcf
...
```

## Getting help
<Notice type="tip">

You can print a goroutine dump without enabling verbose logging.

</Notice>

## Step 3/3. Ask for help

Once you have collected verbose logs and a goroutine dump from your `teleport`
binary, you can use this information to get help from the Teleport community and
Support team.

### Collect your Teleport version

Determine the version of the `teleport` process you are investigating.

```code
$ teleport version
Teleport v8.3.7 git:v8.3.7-0-ga8d066935 go1.17.3
```

You can also collect the versions of the Teleport Auth Service, Proxy
Service, and client tools to rule out version compatibility issues.

<ScopedBlock scope="cloud">

To see the version of the Auth Service and Proxy Service, run the following
command:

```code
$ tctl status
Cluster mytenant.teleport.sh
Version (=cloud.version=)
Host CA never updated
User CA never updated
Jwt CA never updated
CA pin (=presets.ca_pin=)
```

</ScopedBlock>

Get the versions of your client tools:

```code
$ tctl version
Teleport v9.0.4 git: go1.18
$ tsh version
Teleport v9.0.4 git: go1.18
```

### Pose your question

<Tabs>
<TabItem scope={["cloud", "enterprise"]} label="Commercial">
If you need help, please ask on our [community forum](https://github.com/gravitational/teleport/discussions). You can also open an [issue on GitHub](https://github.com/gravitational/teleport/issues) or create a ticket through the [customer dashboard](https://dashboard.gravitational.com/web/login).
Expand All @@ -72,3 +152,12 @@ For more information about custom features, or to try our [Enterprise edition](.
</TabItem>
</Tabs>

## Further reading

This guide showed how to investigate issues with the `teleport` process. To see
how you can monitor more general health and performance data from your Teleport
cluster, read our [Teleport Diagnostics](../reference/metrics.mdx) guide.

For additional sources of Teleport support, please see the
[Teleport Support and Education Center](https://goteleport.com/support/).