diff --git a/docs/pages/setup/admin/troubleshooting.mdx b/docs/pages/setup/admin/troubleshooting.mdx index 57d9171fc735c..bd2127bcde10f 100644 --- a/docs/pages/setup/admin/troubleshooting.mdx +++ b/docs/pages/setup/admin/troubleshooting.mdx @@ -2,43 +2,67 @@ title: Troubleshooting description: Troubleshooting and Collecting Metrics of Teleport Processes --- - -These instructions apply to `teleport` processes running on your own -infrastructure in order to access specific resources. In Teleport Cloud, the -Auth and Proxy Services are monitored and managed for you. +In this guide, we will explain how to address issues or unexpected behavior in your +Teleport cluster. - +You can use these steps to get more visibility into the `teleport` process so +you can troubleshoot the Auth +Service, Proxy Service, and resource-specific services such as +the Application Service and Database Service. + +## Prerequisites + +(!docs/pages/includes/edition-prereqs-tabs.mdx!) + + +- A host where you have installed and configured the `teleport` binary. + + +(!docs/pages/includes/tctl.mdx!) + +## Step 1/3. Enable verbose logging -## Troubleshooting +To diagnose problems, you can configure the `teleport` process to run with +verbose logging enabled by passing it the `-d` flag. `teleport` will write logs +to stderr. -To diagnose problems you can configure the `teleport` process to run with -verbose logging enabled by passing it the `-d` flag. +Logs will resemble the following (these logs were printed while joining a Node +to a cluster, then terminating the `teleport` process on the Node): + +``` +DEBU [NODE:PROX] Agent connected to proxy: [aee1241f-0f6f-460e-8149-23c38709e46d.tele.example.com aee1241f-0f6f-460e-8149-23c38709e46d teleport-proxy-us-west-2-6db8db844c-ftmg9.tele.example.com teleport-proxy-us-west-2-6db8db844c-ftmg9 localhost 127.0.0.1 ::1 tele.example.com 100.92.90.42 remote.kube.proxy.teleport.cluster.local]. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:414 +DEBU [NODE:PROX] Changing state connecting -> connected. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:210 +DEBU [NODE:PROX] Discovery request channel opened: teleport-discovery. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:526 +DEBU [NODE:PROX] handleDiscovery requests channel. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:544 +DEBU [NODE:PROX] Pool is closing agent. leaseID:2 target:tele.example.com:11106 reversetunnel/agentpool.go:238 +DEBU [NODE:PROX] Pool is closing agent. leaseID:3 target:tele.example.com:11106 reversetunnel/agentpool.go:238 +``` + +Debug logs include the file and line number of the code that emitted the log, so +you can investigate (or report) what a `teleport` process was doing before it ran into +problems. - It is not recommended to run Teleport in production with verbose logging as it generates a substantial amount of data. - + It is not recommended to run Teleport in production with verbose logging as it + generates a substantial amount of data. -Sometimes you may want to reset [`teleport`](../reference/cli.mdx#teleport) to a -clean state. This can be accomplished by erasing everything under the `data_dir` -directory, which defaults to `/var/lib/teleport/`. + -## Debug dump +## Step 2/3. Generate a debug dump -You can get a goroutine dump of a running `teleport` process by sending it -`USR1` signal. +The `teleport` binary is a Go program. Go programs assign work to CPU threads +using an abstraction called a **goroutine**. You can get a goroutine dump of a +running `teleport` process by sending it a `USR1` signal. -Locate a running `teleport` daemon PID: +This is especially useful for troubleshooting a `teleport` process that appears +stuck, since you can see which a goroutine is blocked and and why. For example, +goroutines often communicate using **channels**, and a goroutine dump indicates +whether a goroutine is waiting to send or receive on a channel. -```code -# Locate teleport process PID -$ pidof teleport -235119 -``` - -Send a `USR1` signal to a `teleport` process: +To generate a goroutine dump, send a `USR1` signal to a `teleport` process: ```code $ kill -USR1 $(pidof teleport) @@ -50,6 +74,11 @@ the logs: ```txt INFO [PROC:1] Got signal "user defined signal 1", logging diagnostic info to stderr. service/signals.go:99 Runtime stats +goroutines: 64 +OS threads: 10 +GOMAXPROCS: 2 +num CPU: 2 +... goroutines: 84 ... Goroutines @@ -59,7 +88,58 @@ runtime/pprof.writeGoroutineStacks(0x3c2ffc0, 0xc0001a8010, 0xc001011a38, 0x4bcf ... ``` -## Getting help + + +You can print a goroutine dump without enabling verbose logging. + + + +## Step 3/3. Ask for help + +Once you have collected verbose logs and a goroutine dump from your `teleport` +binary, you can use this information to get help from the Teleport community and +Support team. + +### Collect your Teleport version + +Determine the version of the `teleport` process you are investigating. + +```code +$ teleport version +Teleport v8.3.7 git:v8.3.7-0-ga8d066935 go1.17.3 +``` + +You can also collect the versions of the Teleport Auth Service, Proxy +Service, and client tools to rule out version compatibility issues. + + + +To see the version of the Auth Service and Proxy Service, run the following +command: + +```code +$ tctl status +Cluster mytenant.teleport.sh +Version (=cloud.version=) +Host CA never updated +User CA never updated +Jwt CA never updated +CA pin (=presets.ca_pin=) +``` + + + +Get the versions of your client tools: + +```code +$ tctl version +Teleport v9.0.4 git: go1.18 +$ tsh version +Teleport v9.0.4 git: go1.18 +``` + +### Pose your question + If you need help, please ask on our [community forum](https://github.com/gravitational/teleport/discussions). You can also open an [issue on GitHub](https://github.com/gravitational/teleport/issues) or create a ticket through the [customer dashboard](https://dashboard.gravitational.com/web/login). @@ -72,3 +152,12 @@ For more information about custom features, or to try our [Enterprise edition](. +## Further reading + +This guide showed how to investigate issues with the `teleport` process. To see +how you can monitor more general health and performance data from your Teleport +cluster, read our [Teleport Diagnostics](../reference/metrics.mdx) guide. + +For additional sources of Teleport support, please see the +[Teleport Support and Education Center](https://goteleport.com/support/). +