Skip to content

Latest commit

 

History

History
142 lines (101 loc) · 5.62 KB

README.md

File metadata and controls

142 lines (101 loc) · 5.62 KB


Ansible Automation Platform Troubleshooting Guide

GitHub repo: https://github.com/myllynen/aap-troubleshooting-guide
Themed page: https://myllynen.github.io/aap-troubleshooting-guide

Introduction

This page provides basic Ansible Automation Platform (AAP) troubleshooting tips and tricks.

For Red Hat Troubleshooting Ansible Automation Platform guide, see Troubleshooting Ansible Automation Platform.

For Red Hat Ansible automation controller troubleshooting docs, see the AAP Troubleshooting page.

For Red Hat Ansible automation controller generic user guide, see the AAP User Guide.

Ansible Automation Platform Setup

For understanding Ansible Automation Platform setup check its topology view, instances, and settings Web UI pages. The location of these page would be something like:

In case working on the command line use the following command to display AAP instances and topology:

awx-manage list_instances

Troubleshooting Jobs

Jobs Overview

To see currently running jobs visit the jobs page on the Web UI:

The jobs type of Workflow Job are parent jobs that are running the configured number of actual jobs in parallel, see Job Slicing in the corresponding Template → Details page. Jobs are also subject of AAP-wide default configurations, see Settings → Job settings. Pay special attention to the Job Type parameter in the Template → Details page and always use Check or Run as appropriate.

Finding Relevant Jobs

To find jobs related to a certain host or group on the Web UI the filtering capabilities of the automation controller can be used. For example, to find all the jobs where the limit included server23 use the following URL:

https://aap.example.com/#/jobs?job.job__limit__contains=server23

Investigating an Individual Job

For details of a running or a past job navigate to Jobs → Number/Name → Details. Here the following information is found:

  • Project
  • Inventory
  • Job status
  • Verbosity
  • Name of the Execution Node
  • Revision (can be used to check the contents in a git repo)

For the job output check the Output tab for the job. Hint: using Ctrl+- in your browser should zoom out the page which might make the output slightly easier to read. Another option is to download the output and open it in a text editor (other than Notepad which doesn't handle Unix text files properly). Third, and sometimes the best, option is to open the API page which shows the output in full page mode, for instance (obviously replace the job number as appropriate):

https://aap.example.com/api/v2/jobs/1234/stdout/

The output can be downloaded in plain text format by adding ?format=txt to end of the URL.

Debugging a Job

In case a job is not completing as expected it may be a good idea to increase verbosity and disable job slicing to allow better see what is going on. Visit the job template page, set Job Slicing to 1 and apply suitable value for Verbosity. Note that high verbosity values (3 and above) may in some cases cause secrets to be logged. According to Ansible developers this is intended. A reasonable starting verbosity level is often 2.

If there are one or few hosts in the current inventory known to be problematic those could be (temporarily) excluded by editing the template and adding the offending hosts in the Limit string prefixed by an exclamation mark (!).

Investigating AAP Under the Hood

In some cases it might be required to check AAP and system log on AAP controller and/or execution node(s). First, check the AAP topology and possible job details to identify relevant nodes.

On the command line awx-manage list_instances can be used to display AAP instances and topology. For details about automation mesh use the receptorctl command. For instance, to display basics about the current node and automation mesh setup and status use:

receptorctl --socket /run/awx-receptor/receptor.sock status

To ping a node as part of the automation mesh use:

receptorctl --socket /run/awx-receptor/receptor.sock ping en-1.example.com

On the controller nodes the most relevant logs are typically in /var/log/tower.

On the execution nodes the most relevant logs are probably /var/log/messages and in /var/log/receptor.

Additional Information

See Also

See also https://github.com/myllynen/rhel-troubleshooting-guide.

See also https://github.com/myllynen/aap-automation.