Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vanilla install 11.0.0 fails #6792

Closed
JG127 opened this issue Apr 22, 2020 · 59 comments
Closed

Vanilla install 11.0.0 fails #6792

JG127 opened this issue Apr 22, 2020 · 59 comments

Comments

@JG127
Copy link

JG127 commented Apr 22, 2020

ISSUE TYPE
  • Bug Report
SUMMARY

A fresh install of the 11.0.0 release doesn't work, even though the installation instructions are followed. There are sql errors and a recurring error about clustering.

ENVIRONMENT
  • AWX version: 11.0.0
  • AWX install method: docker on linux
  • Ansible version: 2.7.9
  • Python version: 2.7.17
  • Operating System: Linux Mint 19.3
  • Web Browser: n/a
  • Docker version: 19.03.8, build afacb8b7f0
  • Docker Compose version: 1.25.5
STEPS TO REPRODUCE
  1. git clone https://github.com/ansible/awx.git
  2. cd awx
  3. git checkout 11.0.0
  4. cd installer
  5. rm -rf ~/.awx (make certain it is a clean install, empty database)
  6. docker stop $(docker ps -q)
  7. docker rm $(docker ps -qa)
  8. docker rmi -f $(docker image ls -q)
  9. docker system prune -f
  10. virtualenv -p python2 venv
  11. source venv/bin/activate
  12. pip install ansible
  13. pip install docker-compose
  14. ansible-playbook -i inventory install.yml

The installation playbook runs w/o apparent errors. However, when checking the Docker compose logs there are loads of sql errors and cluster errors as shown below.

The procedure was repeated by commenting out the line "dockerhub_base=ansible" in the inventory file. Tot make certain the AWX Docker images are build locally and in sync with the installer. The very same errors happen.

EXPECTED RESULTS

No errors in the logs and a fully functional application.

ACTUAL RESULTS

The logs are filling with errors and the application is not fully functional. Sometimes I'm getting an angry potato logo. I've added a screenshot in attachment. What is it used for ? :-)

The odd thing however is when there is no angry potato logo the application seems to be functional (i.e. management jobs can be run successfully). Despite the huge number of errors in the logs.

When there is an angry potato logo I can log in but not run jobs.

ADDITIONAL INFORMATION

These SQL statement errors below are repeated very frequently: The relations "conf_setting" and "main_instance" do not exist.

awx_postgres | 2020-04-22 07:14:18.999 UTC [43] ERROR:  relation "conf_setting" does not exist at character 158
awx_postgres | 2020-04-22 07:14:18.999 UTC [43] STATEMENT:  SELECT "conf_setting"."id", "conf_setting"."created", "conf_setting"."modified", "conf_setting"."key", "conf_setting"."value", "conf_setting"."user_id" FROM "conf_setting" WHERE ("conf_setting"."key" = 'OAUTH2_PROVIDER' AND "conf_setting"."user_id" IS NULL) ORDER BY "conf_setting"."id" ASC  LIMIT 1

awx_postgres | 2020-04-22 07:14:19.153 UTC [43] ERROR:  relation "main_instance" does not exist at character 24
awx_postgres | 2020-04-22 07:14:19.153 UTC [43] STATEMENT:  SELECT (1) AS "a" FROM "main_instance" WHERE "main_instance"."hostname" = 'awx'  LIMIT 1

This error about clustering is repeated very frequently:

 
awx_web      | Traceback (most recent call last):
awx_web      |   File "/usr/bin/awx-manage", line 8, in <module>
awx_web      |     sys.exit(manage())
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
awx_web      |     execute_from_command_line(sys.argv)
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
awx_web      |     utility.execute()
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
awx_web      |     self.fetch_command(subcommand).run_from_argv(self.argv)
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
awx_web      |     self.execute(*args, **cmd_options)
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
awx_web      |     output = self.handle(*args, **options)
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_wsbroadcast.py", line 128, in handle
awx_web      |     broadcast_websocket_mgr = BroadcastWebsocketManager()
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/wsbroadcast.py", line 151, in __init__
awx_web      |     self.local_hostname = get_local_host()
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/wsbroadcast.py", line 45, in get_local_host
awx_web      |     return Instance.objects.me().hostname
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 116, in me
awx_web      |     raise RuntimeError("No instance found with the current cluster host id")
awx_web      | RuntimeError: No instance found with the current cluster host id

awx_upgrading

@roedie
Copy link

roedie commented Apr 22, 2020

I can confirm this is happening. I just did a clean install as well. Same problem.

@ryanpetrello
Copy link
Contributor

ryanpetrello commented Apr 22, 2020

Yep, I'm able to reproduce this. Looking into it.

@ryanpetrello
Copy link
Contributor

ryanpetrello commented Apr 22, 2020

Actually, I can't reproduce this; as the error message above suggested, it was just migrating (which took a minute).

@ryanpetrello
Copy link
Contributor

@roedie
Copy link

roedie commented Apr 22, 2020

Here's my asciinema. This is on a cleanly installed Debian 10 host with docker and ansible.

https://asciinema.org/a/b1jgaeSFWiv6jHkFmPpO8aLTI?t=13

This is the vars.yml:

postgres_data_dir: "/srv/pgdocker"
docker_compose_dir: "/srv/awxcompose"
pg_password: "pgpass"
admin_password: "adminpass"
secret_key: "secretkey"
project_data_dir: "/srv/awx/projects"

@bryanasdev000
Copy link

bryanasdev000 commented Apr 22, 2020

I'm trying to setup AWX using docker-compose, I'm having the same problems as OP, resulting in an infinite loop (30 minutes so far) of Ansible trying to perform the migrations. I will test again from scratch and report as soon as possible.

@roedie
Copy link

roedie commented Apr 22, 2020

It never finishes the migrations on my hosts, at least, not for an hour. I still have it running so I can have a look again tomorrow ;-)

@ryanpetrello
Copy link
Contributor

Do you see any errors related to migrations? What happens if you exec into the web container and run:

awx-manage migrate

by hand?

@JG127
Copy link
Author

JG127 commented Apr 23, 2020

Maybe unrelated to this issue, but release 11.1.0 has the same errors. After about 15' error messages it seems to resume its proper routine.

@JG127
Copy link
Author

JG127 commented Apr 23, 2020

$ docker-compose exec web bash
bash-4.4# awx-manage migrate
Operations to perform:
  Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
  No migrations to apply.

There must be a difference somewhere. The Python runtime environment perhaps ?

@roedie
Copy link

roedie commented Apr 23, 2020

Hmmm, I get different output than @JG127.

root@awx-test:~# docker exec -ti  261e78c819ad bash
bash-4.4# awx-manage migrate
Operations to perform:
  Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
  Applying main.0001_initial... OK
  Applying main.0002_squashed_v300_release... OK
  Applying main.0003_squashed_v300_v303_updates... OK
  Applying main.0004_squashed_v310_release... OK
  Applying conf.0001_initial... OK
  Applying conf.0002_v310_copy_tower_settings... OK
  Applying main.0005_squashed_v310_v313_updates... OK
  Applying main.0006_v320_release... OK
  Applying main.0007_v320_data_migrations... OK
  Applying main.0008_v320_drop_v1_credential_fields... OK
  Applying main.0009_v322_add_setting_field_for_activity_stream... OK
  Applying main.0010_v322_add_ovirt4_tower_inventory... OK
  Applying main.0011_v322_encrypt_survey_passwords... OK
  Applying main.0012_v322_update_cred_types... OK
  Applying main.0013_v330_multi_credential... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying conf.0003_v310_JSONField_changes... OK
  Applying conf.0004_v320_reencrypt... OK
  Applying conf.0005_v330_rename_two_session_settings... OK
  Applying conf.0006_v331_ldap_group_type... OK
  Applying sessions.0001_initial... OK
  Applying main.0014_v330_saved_launchtime_configs... OK
  Applying main.0015_v330_blank_start_args... OK
  Applying main.0016_v330_non_blank_workflow... OK
  Applying main.0017_v330_move_deprecated_stdout... OK
  Applying main.0018_v330_add_additional_stdout_events... OK
  Applying main.0019_v330_custom_virtualenv... OK
  Applying main.0020_v330_instancegroup_policies... OK
  Applying main.0021_v330_declare_new_rbac_roles... OK
  Applying main.0022_v330_create_new_rbac_roles... OK
  Applying main.0023_v330_inventory_multicred... OK
  Applying main.0024_v330_create_user_session_membership... OK
  Applying main.0025_v330_add_oauth_activity_stream_registrar... OK
  Applying oauth2_provider.0001_initial... OK
  Applying main.0026_v330_delete_authtoken... OK
  Applying main.0027_v330_emitted_events... OK
  Applying main.0028_v330_add_tower_verify... OK
  Applying main.0030_v330_modify_application... OK
  Applying main.0031_v330_encrypt_oauth2_secret... OK
  Applying main.0032_v330_polymorphic_delete... OK
  Applying main.0033_v330_oauth_help_text... OK
2020-04-23 08:00:25,638 INFO     rbac_migrations Computing role roots..
2020-04-23 08:00:25,640 INFO     rbac_migrations Found 0 roots in 0.000213 seconds, rebuilding ancestry map
2020-04-23 08:00:25,640 INFO     rbac_migrations Rebuild ancestors completed in 0.000008 seconds
2020-04-23 08:00:25,640 INFO     rbac_migrations Done.
  Applying main.0034_v330_delete_user_role... OK
  Applying main.0035_v330_more_oauth2_help_text... OK
  Applying main.0036_v330_credtype_remove_become_methods... OK
  Applying main.0037_v330_remove_legacy_fact_cleanup... OK
  Applying main.0038_v330_add_deleted_activitystream_actor... OK
  Applying main.0039_v330_custom_venv_help_text... OK
  Applying main.0040_v330_unifiedjob_controller_node... OK
  Applying main.0041_v330_update_oauth_refreshtoken... OK
2020-04-23 08:00:29,220 INFO     rbac_migrations Computing role roots..
2020-04-23 08:00:29,225 INFO     rbac_migrations Found 0 roots in 0.000184 seconds, rebuilding ancestry map
2020-04-23 08:00:29,225 INFO     rbac_migrations Rebuild ancestors completed in 0.000010 seconds
2020-04-23 08:00:29,225 INFO     rbac_migrations Done.
  Applying main.0042_v330_org_member_role_deparent... OK
  Applying main.0043_v330_oauth2accesstoken_modified... OK
  Applying main.0044_v330_add_inventory_update_inventory... OK
  Applying main.0045_v330_instance_managed_by_policy... OK
  Applying main.0046_v330_remove_client_credentials_grant... OK
  Applying main.0047_v330_activitystream_instance... OK
  Applying main.0048_v330_django_created_modified_by_model_name... OK
  Applying main.0049_v330_validate_instance_capacity_adjustment... OK
  Applying main.0050_v340_drop_celery_tables... OK
  Applying main.0051_v340_job_slicing... OK
  Applying main.0052_v340_remove_project_scm_delete_on_next_update... OK
  Applying main.0053_v340_workflow_inventory... OK
  Applying main.0054_v340_workflow_convergence... OK
  Applying main.0055_v340_add_grafana_notification... OK
  Applying main.0056_v350_custom_venv_history... OK
  Applying main.0057_v350_remove_become_method_type... OK
  Applying main.0058_v350_remove_limit_limit... OK
  Applying main.0059_v350_remove_adhoc_limit... OK
  Applying main.0060_v350_update_schedule_uniqueness_constraint... OK
2020-04-23 08:00:44,638 DEBUG    awx.main.models.credential adding Machine credential type
2020-04-23 08:00:44,660 DEBUG    awx.main.models.credential adding Source Control credential type
2020-04-23 08:00:44,673 DEBUG    awx.main.models.credential adding Vault credential type
2020-04-23 08:00:44,683 DEBUG    awx.main.models.credential adding Network credential type
2020-04-23 08:00:44,692 DEBUG    awx.main.models.credential adding Amazon Web Services credential type
2020-04-23 08:00:44,702 DEBUG    awx.main.models.credential adding OpenStack credential type
2020-04-23 08:00:44,713 DEBUG    awx.main.models.credential adding VMware vCenter credential type
2020-04-23 08:00:44,723 DEBUG    awx.main.models.credential adding Red Hat Satellite 6 credential type
2020-04-23 08:00:44,733 DEBUG    awx.main.models.credential adding Red Hat CloudForms credential type
2020-04-23 08:00:44,743 DEBUG    awx.main.models.credential adding Google Compute Engine credential type
2020-04-23 08:00:44,753 DEBUG    awx.main.models.credential adding Microsoft Azure Resource Manager credential type
2020-04-23 08:00:44,763 DEBUG    awx.main.models.credential adding GitHub Personal Access Token credential type
2020-04-23 08:00:44,773 DEBUG    awx.main.models.credential adding GitLab Personal Access Token credential type
2020-04-23 08:00:44,784 DEBUG    awx.main.models.credential adding Insights credential type
2020-04-23 08:00:44,794 DEBUG    awx.main.models.credential adding Red Hat Virtualization credential type
2020-04-23 08:00:44,804 DEBUG    awx.main.models.credential adding Ansible Tower credential type
2020-04-23 08:00:44,814 DEBUG    awx.main.models.credential adding OpenShift or Kubernetes API Bearer Token credential type
2020-04-23 08:00:44,823 DEBUG    awx.main.models.credential adding CyberArk AIM Central Credential Provider Lookup credential type
2020-04-23 08:00:44,833 DEBUG    awx.main.models.credential adding Microsoft Azure Key Vault credential type
2020-04-23 08:00:44,843 DEBUG    awx.main.models.credential adding CyberArk Conjur Secret Lookup credential type
2020-04-23 08:00:44,854 DEBUG    awx.main.models.credential adding HashiCorp Vault Secret Lookup credential type
2020-04-23 08:00:44,864 DEBUG    awx.main.models.credential adding HashiCorp Vault Signed SSH credential type
  Applying main.0061_v350_track_native_credentialtype_source... OK
  Applying main.0062_v350_new_playbook_stats... OK
  Applying main.0063_v350_org_host_limits... OK
  Applying main.0064_v350_analytics_state... OK
  Applying main.0065_v350_index_job_status... OK
  Applying main.0066_v350_inventorysource_custom_virtualenv... OK
  Applying main.0067_v350_credential_plugins... OK
  Applying main.0068_v350_index_event_created... OK
  Applying main.0069_v350_generate_unique_install_uuid... OK
2020-04-23 08:00:48,324 DEBUG    awx.main.migrations Migrating inventory instance_id for gce to gce_id
  Applying main.0070_v350_gce_instance_id... OK
  Applying main.0071_v350_remove_system_tracking... OK
  Applying main.0072_v350_deprecate_fields... OK
  Applying main.0073_v360_create_instance_group_m2m... OK
  Applying main.0074_v360_migrate_instance_group_relations... OK
  Applying main.0075_v360_remove_old_instance_group_relations... OK
  Applying main.0076_v360_add_new_instance_group_relations... OK
  Applying main.0077_v360_add_default_orderings... OK
  Applying main.0078_v360_clear_sessions_tokens_jt... OK
  Applying main.0079_v360_rm_implicit_oauth2_apps... OK
  Applying main.0080_v360_replace_job_origin... OK
  Applying main.0081_v360_notify_on_start... OK
  Applying main.0082_v360_webhook_http_method... OK
  Applying main.0083_v360_job_branch_override... OK
  Applying main.0084_v360_token_description... OK
  Applying main.0085_v360_add_notificationtemplate_messages... OK
  Applying main.0086_v360_workflow_approval... OK
  Applying main.0087_v360_update_credential_injector_help_text... OK
  Applying main.0088_v360_dashboard_optimizations... OK
  Applying main.0089_v360_new_job_event_types... OK
  Applying main.0090_v360_WFJT_prompts... OK
  Applying main.0091_v360_approval_node_notifications... OK
  Applying main.0092_v360_webhook_mixin... OK
  Applying main.0093_v360_personal_access_tokens... OK
  Applying main.0094_v360_webhook_mixin2... OK
  Applying main.0095_v360_increase_instance_version_length... OK
  Applying main.0096_v360_container_groups... OK
  Applying main.0097_v360_workflowapproval_approved_or_denied_by... OK
  Applying main.0098_v360_rename_cyberark_aim_credential_type... OK
  Applying main.0099_v361_license_cleanup... OK
  Applying main.0100_v370_projectupdate_job_tags... OK
  Applying main.0101_v370_generate_new_uuids_for_iso_nodes... OK
  Applying main.0102_v370_unifiedjob_canceled... OK
  Applying main.0103_v370_remove_computed_fields... OK
  Applying main.0104_v370_cleanup_old_scan_jts... OK
  Applying main.0105_v370_remove_jobevent_parent_and_hosts... OK
  Applying main.0106_v370_remove_inventory_groups_with_active_failures... OK
  Applying main.0107_v370_workflow_convergence_api_toggle... OK
  Applying main.0108_v370_unifiedjob_dependencies_processed... OK
2020-04-23 08:01:26,793 DEBUG    rbac_migrations Migrating inventorysource to new organization field
2020-04-23 08:01:26,808 DEBUG    rbac_migrations Migrating jobtemplate to new organization field
2020-04-23 08:01:26,816 DEBUG    rbac_migrations Migrating project to new organization field
2020-04-23 08:01:26,822 DEBUG    rbac_migrations Migrating systemjobtemplate to new organization field
2020-04-23 08:01:26,822 DEBUG    rbac_migrations Class systemjobtemplate has no organization migration
2020-04-23 08:01:26,822 DEBUG    rbac_migrations Migrating workflowjobtemplate to new organization field
2020-04-23 08:01:26,829 DEBUG    rbac_migrations Migrating workflowapprovaltemplate to new organization field
2020-04-23 08:01:26,829 DEBUG    rbac_migrations Class workflowapprovaltemplate has no organization migration
2020-04-23 08:01:26,830 INFO     rbac_migrations Unified organization migration completed in 0.0366 seconds
2020-04-23 08:01:26,830 DEBUG    rbac_migrations Migrating adhoccommand to new organization field
2020-04-23 08:01:26,838 DEBUG    rbac_migrations Migrating inventoryupdate to new organization field
2020-04-23 08:01:26,846 DEBUG    rbac_migrations Migrating job to new organization field
2020-04-23 08:01:26,853 DEBUG    rbac_migrations Migrating projectupdate to new organization field
2020-04-23 08:01:26,861 DEBUG    rbac_migrations Migrating systemjob to new organization field
2020-04-23 08:01:26,861 DEBUG    rbac_migrations Class systemjob has no organization migration
2020-04-23 08:01:26,861 DEBUG    rbac_migrations Migrating workflowjob to new organization field
2020-04-23 08:01:26,869 DEBUG    rbac_migrations Migrating workflowapproval to new organization field
2020-04-23 08:01:26,869 DEBUG    rbac_migrations Class workflowapproval has no organization migration
2020-04-23 08:01:26,869 INFO     rbac_migrations Unified organization migration completed in 0.0391 seconds
2020-04-23 08:01:29,831 DEBUG    rbac_migrations No changes to role parents for 0 resources
2020-04-23 08:01:29,831 DEBUG    rbac_migrations Added parents to 0 roles
2020-04-23 08:01:29,831 DEBUG    rbac_migrations Removed parents from 0 roles
2020-04-23 08:01:29,832 INFO     rbac_migrations Rebuild parentage completed in 0.004574 seconds
  Applying main.0109_v370_job_template_organization_field... OK
  Applying main.0110_v370_instance_ip_address... OK
  Applying main.0111_v370_delete_channelgroup... OK
  Applying main.0112_v370_workflow_node_identifier... OK
  Applying main.0113_v370_event_bigint... OK
  Applying main.0114_v370_remove_deprecated_manual_inventory_sources... OK
  Applying oauth2_provider.0002_08_updates... OK
  Applying oauth2_provider.0003_auto_20160316_1503... OK
  Applying oauth2_provider.0004_auto_20160525_1623... OK
  Applying oauth2_provider.0005_auto_20170514_1141... OK
  Applying oauth2_provider.0006_auto_20171214_2232... OK
  Applying sites.0001_initial... OK
  Applying sites.0002_alter_domain_unique... OK
  Applying social_django.0001_initial... OK
  Applying social_django.0002_add_related_name... OK
  Applying social_django.0003_alter_email_max_length... OK
  Applying social_django.0004_auto_20160423_0400... OK
  Applying social_django.0005_auto_20160727_2333... OK
  Applying social_django.0006_partial... OK
  Applying social_django.0007_code_timestamp... OK
  Applying social_django.0008_partial_timestamp... OK
  Applying sso.0001_initial... OK
  Applying sso.0002_expand_provider_options... OK
  Applying taggit.0003_taggeditem_add_unique_index... OK
bash-4.4# 

After this I do get the login prompt, but somehow I cannot log in.

@roedie
Copy link

roedie commented Apr 23, 2020

After the migrations I still get the crashing dispatchter:

2020-04-23 08:19:14,009 INFO spawned: 'dispatcher' with pid 25200
2020-04-23 08:19:15,011 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-04-23 08:19:17,893 WARNING  awx.main.dispatch.periodic periodic beat started
Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 55, in handle
    reaper.reap()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap
    (changed, me) = Instance.objects.get_or_register()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 144, in get_or_register
    return (False, self.me())
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 116, in me
    raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id
2020-04-23 08:19:18,418 INFO exited: dispatcher (exit status 1; not expected)

@JG127
Copy link
Author

JG127 commented Apr 23, 2020

Adding the logs of the installer and the logs of the very first "docker-compose up" command.

initial_start.log.tar.gz
ansible_install.log

I've got the impression the web service is waking up too early. It logs all sorts of errors for as long as the task service hasn't finished with the migrations.

Or rather, the task service wakes up late for the migrations. It's only at the end of the log file it begins to actually do something.

The net result however is that the system is functional. Albeit it took its time to get to that point. Maybe a network timeout somewhere ?

@ryanpetrello
Copy link
Contributor

I agree, @JG127, this does sounds like sort of timing issue/race on startup. I've still been unable to reproduce, so if any of you find any additional clues, please let me know and I'm glad to help dig.

@JG127
Copy link
Author

JG127 commented Apr 24, 2020

The only thing coming to mind is the Python environment used to run the installer and the application. I always use a Virtualenv environment to work in when doing Python projects. Otherwise I'll end up using the libraries of the OS installed software.

This is the elaborate description of what I do to set up the runtime environment:

# Make certain no third-party configuration impacts the process
$ mv ~/.local ~/.local_disabled
$ sudo mv /etc/ansible/ansible.cfg /etc/ansible/ansible.cfg_disabled
$ sudo mv ~/.awx ~/.awx_disabled

# Clean Docker completely 
$ docker stop $(docker ps -q)
$ docker rm $(docker ps -qa)
$ docker rmi -f $(docker image ls -q)
$ docker system prune -f
$ docker builder prune -f
$ docker volume prune -f

# Create the runtime environment
$ virtualenv -p python2 venv
Running virtualenv with interpreter ~/.pyenv/shims/python2
Already using interpreter /usr/bin/python2
New python executable in ~/Projects/awx/venv/bin/python2
Also creating executable in ~/Projects/awx/venv/bin/python
Installing setuptools, pip, wheel...
done.
$ source venv/bin/activate
(venv) $ pip install ansible docker-compose
  ...
  ...
  ...
(venv) $ pip freeze
ansible==2.9.7
attrs==19.3.0
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.7.0.1
bcrypt==3.1.7
cached-property==1.5.1
certifi==2020.4.5.1
cffi==1.14.0
chardet==3.0.4
configparser==4.0.2
contextlib2==0.6.0.post1
cryptography==2.9.2
docker==4.2.0
docker-compose==1.25.5
dockerpty==0.4.1
docopt==0.6.2
enum34==1.1.10
functools32==3.2.3.post2
idna==2.9
importlib-metadata==1.6.0
ipaddress==1.0.23
Jinja2==2.11.2
jsonschema==3.2.0
MarkupSafe==1.1.1
paramiko==2.7.1
pathlib2==2.3.5
pycparser==2.20
PyNaCl==1.3.0
pyrsistent==0.16.0
PyYAML==5.3.1
requests==2.23.0
scandir==1.10.0
six==1.14.0
subprocess32==3.5.4
texttable==1.6.2
urllib3==1.25.9
websocket-client==0.57.0
zipp==1.2.0

(venv) $ python --version
Python 2.7.17

(venv) $ ansible --version
ansible 2.9.7
  config file = None
  configured module search path = [u'~/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = ~/Projects/awx/venv/local/lib/python2.7/site-packages/ansible
  executable location = ~/Projects/awx/venv/bin/ansible
  python version = 2.7.17 (default, Apr 15 2020, 17:20:14) [GCC 7.5.0]

(venv) $ docker-compose --version
docker-compose version 1.25.5, build unknown

(venv) $ docker --version
Docker version 19.03.8, build afacb8b7f0

(venv) $ docker system info
Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.3.0-46-generic
 Operating System: Linux Mint 19.3
 OSType: linux
 Architecture: x86_64
 CPUs: 7
 Total Memory: 7.773GiB
 Name: workvm
 ID: DGRT:4RDB:6YC2:QTEB:U3IL:HDDQ:VCIT:HSUW:L344:KORB:SAPZ:MXIB
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

(venv) $ cd installer

(venv) $ ansible-playbook -i inventory install.yml

PLAY [Build and deploy AWX] ******************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************
ok: [localhost]

TASK [check_vars : include_tasks] ************************************************************************************************************************************
skipping: [localhost]

TASK [check_vars : include_tasks] ************************************************************************************************************************************
included: /home/jan/Projects/awx/installer/roles/check_vars/tasks/check_docker.yml for localhost

TASK [check_vars : postgres_data_dir should be defined] **************************************************************************************************************
ok: [localhost] => {
    "changed": false, 
    "msg": "All assertions passed"
}

TASK [check_vars : host_port should be defined] **********************************************************************************************************************
ok: [localhost] => {
    "changed": false, 
    "msg": "All assertions passed"
}

TASK [image_build : Set global version if not provided] **************************************************************************************************************
skipping: [localhost]

TASK [image_build : Verify awx-logos directory exists for official install] ******************************************************************************************
skipping: [localhost]

TASK [image_build : Copy logos for inclusion in sdist] ***************************************************************************************************************
skipping: [localhost]

TASK [image_build : Set sdist file name] *****************************************************************************************************************************
skipping: [localhost]

TASK [image_build : AWX Distribution] ********************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stat distribution file] **************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Clean distribution] ******************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Build sdist builder image] ***********************************************************************************************************************
skipping: [localhost]

TASK [image_build : Build AWX distribution using container] **********************************************************************************************************
skipping: [localhost]

TASK [image_build : Build AWX distribution locally] ******************************************************************************************************************
skipping: [localhost]

TASK [image_build : Set docker build base path] **********************************************************************************************************************
skipping: [localhost]

TASK [image_build : Set awx_web image name] **************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Set awx_task image name] *************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Ensure directory exists] *************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage sdist] *************************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Template web Dockerfile] *************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Template task Dockerfile] ************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage launch_awx] ********************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage launch_awx_task] ***************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage google-cloud-sdk.repo] *********************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage rsyslog.repo] ******************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage rsyslog.conf] ******************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage supervisor.conf] ***************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage supervisor_task.conf] **********************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage settings.py] *******************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage requirements] ******************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage config watcher] ****************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Stage Makefile] **********************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Build base web image] ****************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Build base task image] ***************************************************************************************************************************
skipping: [localhost]

TASK [image_build : Tag task and web images as latest] ***************************************************************************************************************
skipping: [localhost]

TASK [image_build : Clean docker base directory] *********************************************************************************************************************
skipping: [localhost]

TASK [image_push : Authenticate with Docker registry if registry password given] *************************************************************************************
skipping: [localhost]

TASK [image_push : Remove web image] *********************************************************************************************************************************
skipping: [localhost]

TASK [image_push : Remove task image] ********************************************************************************************************************************
skipping: [localhost]

TASK [image_push : Tag and push web image to registry] ***************************************************************************************************************
skipping: [localhost]

TASK [image_push : Tag and push task image to registry] **************************************************************************************************************
skipping: [localhost]

TASK [image_push : Set full image path for Registry] *****************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Generate broadcast websocket secret] **************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : fail] *********************************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : include_tasks] ************************************************************************************************************************************
skipping: [localhost] => (item=openshift_auth.yml) 
skipping: [localhost] => (item=openshift.yml) 

TASK [kubernetes : include_tasks] ************************************************************************************************************************************
skipping: [localhost] => (item=kubernetes_auth.yml) 
skipping: [localhost] => (item=kubernetes.yml) 

TASK [kubernetes : Use kubectl or oc] ********************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : set_fact] *****************************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Record deployment size] ***************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Set expected post-deployment Replicas value] ******************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Delete existing Deployment (or StatefulSet)] ******************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Get Postgres Service Detail] **********************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Template PostgreSQL Deployment (OpenShift)] *******************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Deploy and Activate Postgres (OpenShift)] *********************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Create Temporary Values File (Kubernetes)] ********************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Populate Temporary Values File (Kubernetes)] ******************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Deploy and Activate Postgres (Kubernetes)] ********************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Remove tempfile] **********************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Set postgresql hostname to helm package service (Kubernetes)] *************************************************************************************
skipping: [localhost]

TASK [kubernetes : Wait for Postgres to activate] ********************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Check if Postgres 9.6 is being used] **************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Set new pg image] *********************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Wait for change to take affect] *******************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Set env var for pg upgrade] ***********************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Wait for change to take affect] *******************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Set env var for new pg version] *******************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Wait for Postgres to redeploy] ********************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Wait for Postgres to finish upgrading] ************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Unset upgrade env var] ****************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Wait for Postgres to redeploy] ********************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Set task image name] ******************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Set web image name] *******************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Determine Deployment api version] *****************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Render deployment templates] **********************************************************************************************************************
skipping: [localhost] => (item=None) 
skipping: [localhost] => (item=None) 
skipping: [localhost] => (item=None) 
skipping: [localhost] => (item=None) 
skipping: [localhost] => (item=None) 
skipping: [localhost]

TASK [kubernetes : Apply Deployment] *********************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Delete any existing management pod] ***************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Template management pod] **************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Create management pod] ****************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Wait for management pod to start] *****************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Migrate database] *********************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Check for Tower Super users] **********************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : create django super user if it does not exist] ****************************************************************************************************
skipping: [localhost]

TASK [kubernetes : update django super user password] ****************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Create the default organization if it is needed.] *************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Delete management pod] ****************************************************************************************************************************
skipping: [localhost]

TASK [kubernetes : Scale up deployment] ******************************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Generate broadcast websocket secret] ************************************************************************************************************
ok: [localhost]

TASK [local_docker : Check for existing Postgres data] ***************************************************************************************************************
ok: [localhost]

TASK [local_docker : Record Postgres version] ************************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Determine whether to upgrade postgres] **********************************************************************************************************
ok: [localhost]

TASK [local_docker : Set up new postgres paths pre-upgrade] **********************************************************************************************************
skipping: [localhost] => (item=~/.awx/pgdocker/10/data) 

TASK [local_docker : Stop AWX before upgrading postgres] *************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Upgrade Postgres] *******************************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Copy old pg_hba.conf] ***************************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Remove old data directory] **********************************************************************************************************************
ok: [localhost]

TASK [local_docker : Export Docker web image if it isnt local and there isnt a registry defined] *********************************************************************
skipping: [localhost]

TASK [local_docker : Export Docker task image if it isnt local and there isnt a registry defined] ********************************************************************
skipping: [localhost]

TASK [local_docker : Set docker base path] ***************************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Ensure directory exists] ************************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Copy web image to docker execution] *************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Copy task image to docker execution] ************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Load web image] *********************************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Load task image] ********************************************************************************************************************************
skipping: [localhost]

TASK [local_docker : Set full image path for local install] **********************************************************************************************************
skipping: [localhost]

TASK [local_docker : Set DockerHub Image Paths] **********************************************************************************************************************
ok: [localhost]

TASK [local_docker : Create ~/.awx/awxcompose directory] *************************************************************************************************************
changed: [localhost]

TASK [local_docker : Create Redis socket directory] ******************************************************************************************************************
changed: [localhost]

TASK [local_docker : Create Memcached socket directory] **************************************************************************************************************
changed: [localhost]

TASK [local_docker : Create Docker Compose Configuration] ************************************************************************************************************
changed: [localhost] => (item=environment.sh)
changed: [localhost] => (item=credentials.py)
changed: [localhost] => (item=docker-compose.yml)
changed: [localhost] => (item=nginx.conf)
changed: [localhost] => (item=redis.conf)

TASK [local_docker : Set redis config to other group readable to satisfy redis-server] *******************************************************************************
changed: [localhost]

TASK [local_docker : Render SECRET_KEY file] *************************************************************************************************************************
changed: [localhost]

TASK [local_docker : Start the containers] ***************************************************************************************************************************

(venv) $ cd ..
(venv) $ docker-compose logs -f
...
...
the errors
...
...

I repeat the process with Python3. Since Python2 is deprecated and AWX is using Python3 to run ansible. It's the very same routine as described above except for the virtualenv command.

Replace

virtualenv -p python2 venv

with

virtualenv -p python3 venv

Check of the releases

(venv) $ python --version
Python 3.6.9

(venv) $ ansible --version
ansible 2.9.7
  config file = None
  configured module search path = ['~/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = ~/Projects/awx/venv/lib/python3.6/site-packages/ansible
  executable location = ~/Projects/awx/venv/bin/ansible
  python version = 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]

(venv) $ pip freeze
ansible==2.9.7
attrs==19.3.0
bcrypt==3.1.7
cached-property==1.5.1
certifi==2020.4.5.1
cffi==1.14.0
chardet==3.0.4
cryptography==2.9.2
docker==4.2.0
docker-compose==1.25.5
dockerpty==0.4.1
docopt==0.6.2
idna==2.9
importlib-metadata==1.6.0
Jinja2==2.11.2
jsonschema==3.2.0
MarkupSafe==1.1.1
paramiko==2.7.1
pycparser==2.20
PyNaCl==1.3.0
pyrsistent==0.16.0
PyYAML==5.3.1
requests==2.23.0
six==1.14.0
texttable==1.6.2
urllib3==1.25.9
websocket-client==0.57.0
zipp==3.1.0

The very same errors pop up.

A long shot is that by some fluke you are using different versions of the involved Docker images. You might want to check the image id's to make certain those are the same.

$ docker image ls
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
redis               latest              a4d3716dbb72        13 hours ago        98.3MB
postgres            10                  b500168be260        17 hours ago        200MB
ansible/awx_task    11.0.0              83a56dfe4148        7 days ago          2.52GB
ansible/awx_web     11.0.0              ab9667094eac        7 days ago          2.48GB
memcached           alpine              acce7f7ac2ef        10 days ago         9.22MB

(why use both Redis and Memcached btw ?)

And a very long shot is that it makes a difference to do this in a virtual machine. I am using VirtualBox 6.1.16 on a Windows 10 host. As per company regulations.

@JG127
Copy link
Author

JG127 commented Apr 24, 2020

Maybe this will shed some light ...

While the task service is just sitting there it consumes 100% CPU. The processes:

# ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 11:14 ?        00:00:00 tini -- sh -c /usr/bin/launch_awx_task.sh
root         6     1  0 11:14 ?        00:00:00 bash /usr/bin/launch_awx_task.sh
root        69     6 95 11:15 ?        00:02:18 /var/lib/awx/venv/awx/bin/python3 /usr/bin/awx-manage migrate --noinput

Is there something I can try to find out what it is doing ?

@ryanpetrello
Copy link
Contributor

What do the task logs/stdout say?

Maybe try something like straceor gdb on that awe-manage migrate process to see what its' doing?

@JG127
Copy link
Author

JG127 commented Apr 24, 2020

No logging at all. Not in the docker logs nor the log files in /var/log. This means the issue happens very early in the code. Before it logs something.
It might really be stuck on a network connection after all; when the timeout mechanism does not rely on blocking i/o.
Surely it's such a silly problem it doesn't even come to mind :-)

@Naf3tsR
Copy link

Naf3tsR commented Apr 24, 2020

I'm experiencing the same problem. I' not able to start any release. I've tried 9.3.0 10.0.0 and 11.1.0.
My error looks like this:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/conf/settings.py", line 87, in _ctit_db_wrapper
    yield
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/conf/settings.py", line 415, in __getattr__
    value = self._get_local(name)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/conf/settings.py", line 358, in _get_local
    setting = Setting.objects.filter(key=name, user__isnull=True).order_by('pk').first()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 653, in first
    for obj in (self if self.ordered else self.order_by('pk'))[:1]:
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 274, in __iter__
    self._fetch_all()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1131, in execute_sql
    cursor = self.connection.cursor()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 256, in cursor
    return self._cursor()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 233, in _cursor
    self.ensure_connection()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: FATAL:  no pg_hba.conf entry for host "172.20.0.6", user "awx", database "awx", SSL off

2020-04-24 13:23:30,507 ERROR    awx.conf.settings Database settings are not available, using defaults.
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL:  no pg_hba.conf entry for host "172.20.0.6", user "awx", database "awx", SSL off

@JG127
Copy link
Author

JG127 commented Apr 28, 2020

If this were a Java based system I would dump the threads or do a cpu profile. Python however is a bit new to me. Is there a way to cpu-profile a Python process ?

@stmorim
Copy link

stmorim commented Apr 30, 2020

I ran into this problem today as well. I was able to fix it by deleting all of the running docker containers and then running ansible-playbook -i inventory install.yml again. it took less than 2 minutes for the GUI to come up and I was able to log in.

Hope this helps.

@Naf3tsR
Copy link

Naf3tsR commented May 5, 2020

I am getting the same error as @JG127 .

Starting with:

awx_web      | Traceback (most recent call last):
awx_web      |   File "/usr/bin/awx-manage", line 8, in <module>
awx_web      |     sys.exit(manage())
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
awx_web      |     execute_from_command_line(sys.argv)
awx_web      |   File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
awx_web      |     utility.execute()

The fresh install does not work for me, for whatever reason.

In addition to that I get the error I've posted in an earlier post.
There it says that pg_hba.conf is not configured properly and there is
no entry for host "172.20.0.6"

Therefore, I've investigated pg_hba.conf. There you can find no allow entry for that host. Then I modified the pg_hba.conf like this, allowing everything in 172.16.0.0./12

# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    all             all             172.16.0.0/12           trust

After saving the changes and starting the containers again, the problem is gone.

It looks like, because of the error in the init.py, pg_hba.conf gets not populated correctly.

@Benoit-LAGUET
Copy link

@Naf3tsR It works for me too !
Good job and thanks :)

@JG127
Copy link
Author

JG127 commented May 5, 2020

Yes ! :-) Can this be fixed via Docker (Compose) ? I'd rather not hack my way in.

@sangrealest
Copy link

@Naf3tsR Thanks, it works for me. I'm using the 11.2.0 version, still has the issue.

@alishahrestani
Copy link

Hi, I have same problem, it was resolved with using Postgresql 12.

@JG127
Copy link
Author

JG127 commented May 19, 2020

Upgrade to Postgresql 12 didn't help for me.

@bryanasdev000
Copy link

bryanasdev000 commented May 21, 2020

I'm having the same problem in version 11.2.0, using the docker compose, however, in the second attempt it always works. I'm using PostgresSQL in compose as well. My logs are similar to @roedie.

Out of curiosity, did anyone have the problem using OpenShift or Kubernetes?

@bpetit
Copy link

bpetit commented Jun 26, 2020

I just tried a fresh install with the 13.0.0 (docker-compose mode on debian 10). It seems to give the "main_instance" error too:

2020-06-26 08:38:09,393 INFO exited: dispatcher (exit status 1; not expected)
2020-06-26 08:38:09,393 INFO exited: dispatcher (exit status 1; not expected)
2020-06-26 08:38:10,406 INFO spawned: 'dispatcher' with pid 160
2020-06-26 08:38:10,406 INFO spawned: 'dispatcher' with pid 160
2020-06-26 08:38:11,409 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-06-26 08:38:11,409 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedTable: relation "main_instance" does not exist
LINE 1: SELECT (1) AS "a" FROM "main_instance" WHERE "main_instance"...
^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/bin/awx-manage", line 8, in
sys.exit(manage())
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/init.py", line 154, in manage
execute_from_command_line(sys.argv)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 381, in execute_from_command_line
utility.execute()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 55, in handle
reaper.reap()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap
(changed, me) = Instance.objects.get_or_register()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 144, in get_or_register
return (False, self.me())
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 100, in me
if node.exists():
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/query.py", line 766, in exists
return self.query.has_results(using=self.db)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/query.py", line 522, in has_results
return compiler.has_results()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1110, in has_results
return bool(self.execute_sql(SINGLE))
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1140, in execute_sql
cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/utils.py", line 89, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: relation "main_instance" does not exist
LINE 1: SELECT (1) AS "a" FROM "main_instance" WHERE "main_instance"...

Then I ran awx-manage migrate which left me with:

2020-06-26 08:46:19,276 INFO exited: dispatcher (exit status 1; not expected)
2020-06-26 08:46:19,276 INFO exited: dispatcher (exit status 1; not expected)
2020-06-26 08:46:20,287 INFO spawned: 'dispatcher' with pid 801
2020-06-26 08:46:20,287 INFO spawned: 'dispatcher' with pid 801
2020-06-26 08:46:21,291 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-06-26 08:46:21,291 INFO success: dispatcher entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2020-06-26 08:46:23,396 WARNING awx.main.dispatch.periodic periodic beat started
Traceback (most recent call last):
File "/usr/bin/awx-manage", line 8, in
sys.exit(manage())
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/init.py", line 154, in manage
execute_from_command_line(sys.argv)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 381, in execute_from_command_line
utility.execute()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/init.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 55, in handle
reaper.reap()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap
(changed, me) = Instance.objects.get_or_register()
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 144, in get_or_register
return (False, self.me())
File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/managers.py", line 102, in me
raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id

@bryanasdev000
Copy link

bryanasdev000 commented Jun 27, 2020

Tried with latest version (v13.0.0) on a clean env (Debian 10) and didn't have issues.

NOTE: Using an external PostgreSQL 11.8 database.

@anxstj
Copy link
Contributor

anxstj commented Jul 22, 2020

I see this problem in my CI job that builds my awx containers. Approximately one out of 10 starts fail.

It seems that something is killing (or restarting) the postgres container quite early

root@runner-hgefapak-project-60-concurrent-1:/# docker logs -f awx_postgres
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data/pgdata ... ok
...
server started
CREATE DATABASE
...
2020-07-22 14:30:45.852 UTC [76] ERROR:  relation "django_migrations" does not exist at character 124
2020-07-22 14:30:45.852 UTC [76] STATEMENT:  SELECT "django_migrations"."id", "django_migrations"."app", "django_migrations"."name", "django_migrations"."applied" FROM "django_migrations" WHERE ("django_migrations"."app" = 'main' AND NOT ("django_migrations"."name"::text LIKE '%squashed%')) ORDER BY "django_migrations"."id" DESC  LIMIT 1
root@runner-hgefapak-project-60-concurrent-1:/#     # !!!! HERE something has stopped the container !!!!
root@runner-hgefapak-project-60-concurrent-1:/# docker logs -f awx_postgres
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
...

Which in turn causes the migration task inside the awx_task container to fail:

root@runner-hgefapak-project-60-concurrent-1:/# docker logs -f awx_task
Using /etc/ansible/ansible.cfg as config file
127.0.0.1 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/libexec/platform-python"
    },
    "changed": false,
    "elapsed": 0,
    "match_groupdict": {},
    "match_groups": [],
    "path": null,
    "port": 5432,
    "search_regex": null,
    "state": "started"
}
Using /etc/ansible/ansible.cfg as config file
127.0.0.1 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/libexec/platform-python"
    },
    "changed": false,
    "db": "awx"
}
Operations to perform:
  Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying taggit.0001_initial... OK
  Applying taggit.0002_auto_20150616_2121... OK
  Applying auth.0001_initial... OK
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib/python3.6/site-packages/django/db/backends/utils.py", line 82, in _execute
    return self.cursor.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request

Restarting the awx_task container seams to restart the migration process which is then working. (supervisorctl restart all doesn't help).

So the question is, what is restarting the awx_postgres container?

From the postgres container entrypoint:

			docker_temp_server_start "$@"

			docker_setup_db
			docker_process_init_files /docker-entrypoint-initdb.d/*

			docker_temp_server_stop

If the migrate process starts during the postgresql initialization, then the connection will be dropped as soon as the temp_server stops.

@johnrkriter
Copy link

Came here while investigating an issue during Release 13.0.0 install issue that was also giving me ERROR: relation "conf_setting" does not exist at character 158

I ended up reverting to release 12.0.0 while troubleshooting assuming it was a dirty release.
This is a fresh docker-compose install using containerized Postgres.

I was able to eventually get into the web UI for 12.0.0 using 2 steps.

  1. (Credit @JG127 ) I first exec into the awx_web container and ran awx-manage migrate. This completed the migration without errors.
  2. (Credit @anxstj) I then restarted awx_task container. I can now see that postgres appears happy

This appears to be a valid work-around for new docker-compose install at least with my config

@tumluliu
Copy link

ran into the same UndefinedTable: relation "main_instance" does not exist problem with 14.1.0 installed with docker-compose and vanilla settings. Fixed by

  1. running awx-manage migrate within awx_task container
  2. restarting awx_task container

as mentioned by @johnrkriter (thanks a lot!)
But could not understand what is the major difficulty to resolve this

@kakkotetsu
Copy link

I had same problem , and it was resolved with same way...
(Ubuntu 20.04.1 + AWX 14.1.0 + docker-compose clean install)

@arashnikoo
Copy link

The workaround works for me as well. Is this going to be fixed?

@Nuttymoon
Copy link

This issue is still around with AWX 15.0.0 on docker-compose deployment.
The workaround of @johnrkriter works:

docker exec awx_task awx-manage migrate
docker container restart awx_task

@JG127
Copy link
Author

JG127 commented Oct 7, 2020

Pity is that nobody from the project seems interested... :-(

@jklare
Copy link

jklare commented Oct 8, 2020

This looks like a race condition betweent the pg container and the aws-task container. Since i am not familiar with the project structure it will probably take me some time to find the right place to look :) I will update this as soon as i find something.

@jklare
Copy link

jklare commented Oct 8, 2020

So i think the description of @anxstj is pretty accurate and complete, we now just need to figure out a good way to wait for the postges ini to finish before we start the migration. Does anybody have a good idea how to do that?

@jklare
Copy link

jklare commented Oct 8, 2020

Judging from the issue itself, i think the best option to fix this is to make the aws-task container (or the script running in it) fail if the migration fails instead of trying to continue with something that will never succeed. The db migrations themselves should be idempotent, so just failing and starting fresh should be fine.

dasJ added a commit to dasJ/awx that referenced this issue Oct 30, 2020
This change will make docker-compose wait for both the redis and the
postgres container before starting the other services.

related ansible#6792
@dasJ
Copy link

dasJ commented Oct 30, 2020

Does it work after applying #8497 ?

dasJ added a commit to dasJ/awx that referenced this issue Oct 30, 2020
This change will make docker-compose wait for both the redis and the
postgres container before starting the other services.

related ansible#6792
dasJ added a commit to dasJ/awx that referenced this issue Oct 30, 2020
This change will make the t ask container wait for postgres before
running the migrations.

related ansible#6792
@jklare
Copy link

jklare commented Oct 30, 2020

@dasJ I think the patch could still run into the issue where the "while loop" exits because it successfully accesses the instance of the "docker_temp_server" that will be killed directly after. It makes it a lot more unlikely, but i do not see how the patch would completely avoid this case. It think the real fix here is to make the aws-task container completely fail on this error and start fresh (not sure if the "set -e" is enough here since iirc all errors are still retried when running "aws-migrate")

@dasJ
Copy link

dasJ commented Oct 30, 2020

@jklare wdym by "docker_temp_server"? I can neither find it in this repo nor when googeling

@jklare
Copy link

jklare commented Oct 30, 2020

@dasJ in the description of the issue @anxstj gave a couple of post above, it is mentioned. This temp postgres server instance is basically run when starting the postgres container to run initial migrations (https://github.com/docker-library/postgres/blob/master/11/docker-entrypoint.sh#L297-L302). And since this temp server is available and accepting connections and then killed/stopped shortly after, this causes the race condition.

@dasJ
Copy link

dasJ commented Oct 30, 2020 via email

@jklare
Copy link

jklare commented Oct 30, 2020

Yeah, i think it will be a lot less likely to happen (just running any command will already help here), but i think the real fix would be to make the awx-task container fail hard and exit if the db migration fails for any reason. It will be automatically restarted after exiting, the migration will be retried and will work the second time for sure. I have not had time yet to investigate on where to make the migration task inside the awx-task container fail, sry for not beeing super helpful here :(

@dasJ
Copy link

dasJ commented Oct 30, 2020

I was hoping my set -e does exactly that. If the processes tend to retry their migrations, I'm honestly too lazy to investigate this ;)

@sxa
Copy link

sxa commented Dec 12, 2020

Had a lot of head scratching in the last week as I'd been getting this 100% repeatedly on my system. Two solutions are referenced in this issue, but bear in mind you have to leave about a 2½ minute gap after running the initial playbook (assuming the database is being created from scratch) before taking the remidial action (my system is a quad core atom C2550 with 8GB RAM running Ubuntu 20.04LTS)

It would be very useful to get this resolved since the out of the box experience with the simplest AWX configuration seems to be problematic (so likely inhibits adoption) and has been for a while (I tried from HEAD back to - i think 10 was the earliest I tried.

So to reiterate the two scriptable solutions are:

ansible-playbook -v -i inventory install.yml
sleep 150
docker exec awx_task awx-manage migrate
docker container restart awx_task
sleep 240

as mentioned earlier in thsi thread by a few people (although it gaves me an exception on the upgrade: psycopg2.errors.UndefinedColumn: column "authorize" of relation "main_credential" does not exist but still seems to work) or:

ansible-playbook -v -i inventory install.yml
sleep 150
ansible-playbook -v -i inventory install.yml
sleep 240

as per #6931

@JG127
Copy link
Author

JG127 commented Jan 14, 2021

Any chance this is fixed in 16.0.0 ?

@shanemcd
Copy link
Member

No updates on this issue in a while. Going to assume it was fixed or not relevant for newer versions.

@Shivu1434
Copy link

TASK [local_docker : Check for existing Postgres data (run from inside the container for access to file)] ***********************************************************************************************************************************
task path: /root/awx/installer/roles/local_docker/tasks/upgrade_postgres.yml:16
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "docker run --rm -v '/var/lib/pgdocker:/var/lib/postgresql' centos:8 bash -c "[[ -f /var/lib/postgresql/10/data/PG_VERSION ]] && echo 'exists'"\n", "delta": "0:00:00.424388", "end": "2022-06-03 01:00:01.355937", "msg": "non-zero return code", "rc": 1, "start": "2022-06-03 01:00:00.931549", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
...ignoring

@Shivu1434
Copy link

I'm facing above mentioned issue while running
ansible-playbook -i inventory install.yml file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests