Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10.0 redis clustering #26

Closed
chris93111 opened this issue Apr 14, 2020 · 17 comments
Closed

10.0 redis clustering #26

chris93111 opened this issue Apr 14, 2020 · 17 comments

Comments

@chris93111
Copy link

Hi @sujiar37

10.0 is available but i rabbitmq is replace by redis

i have see redis cluster is possible with master an slave but i don't know if is target for the cluster awx
because is use socket

https://github.com/ansible/awx/pull/6034/files

Do you have see ? do you have any idea of ​​integration of redis in the cluster ?

@chris93111
Copy link
Author

i have maybe found awx it's now using postgres for inter-node communication i try and send feedback

@sebstyle
Copy link

Indeed 11.0.0 is the latest tag.
On each node local Redis communicates with local web sockets connected to the web interface.
Events relevant to the entire cluster are now communicated thru PostgreSQL.
In future versions Redis will most likely also replace the role of Memcached.

Custom clustering with RabbitMQ as this playbook provides seems to no longer be relevant.

@sebstyle
Copy link

ansible/awx#5443

@chris93111
Copy link
Author

@sebstyle yes clustering use notification postgresql for communication internode

@ryanpetrello
Copy link

Just chiming in - AWX isn't using any sort of redis clustering; every redis on every node is unaware of the others (if you all have questions, I'm happy to answer them).

@RylandDeGregory
Copy link

Just chiming in - AWX isn't using any sort of redis clustering; every redis on every node is unaware of the others (if you all have questions, I'm happy to answer them).

Hi Ryan,

What is the high-level process for clustering Local Docker installations now that Redis is implemented and inter-node communication is done through PostgreSQL?

@moonrail
Copy link

moonrail commented May 4, 2020

I have clustered some AWX nodes by just writing a wrapper around the official role local_docker

I have not fully tested every feature, but these are the steps required that I know of:

  • clone AWX-Repository by Tag on execution host to e.g. /tmp/awx
  • create dir ~/.awx/awxcompose on target host
  • copy /tmp/awx/installer/roles/image_build/files/settings.py to ~/.awx/awxcompose/settings.py on target host
  • replace CLUSTER_HOST_ID = "awx" of copied settings.py with your AWX nodes accessible FQDN, e.g. CLUSTER_HOST_ID = "host.domain.tld"
    • lineinfile with regexp did the trick for me
  • note that your individual secrets should be the same across all nodes, so stuff them somewhere save, in a vault for example:
  • the only real "hack" here is the following:
    • we have to somehow include ~/.awx/awxcompose/settings.py in each awx_web & awx_task container
    • as here and here project_data_dir is used in volumes in docker-compose.yml, we can replace this with a multiline string, to give us another volume without needing to modify the official role local_docker
    • see below at passed vars to include_role for an basic example
  • then just use Ansibles include_role and pass all vars, that are normally fetched from the inventory-file:
- name: 'include official installation role "local_docker"'
  include_role:
    name: '/tmp/awx/installer/roles/local_role'
  vars:
    secret_key: '{{ lookup("your_secret_provider_plugin", "params") }}'
    admin_password: '{{ lookup("your_secret_provider_plugin", "params")}}'
    broadcast_websocket_secret: '{{ lookup("your_secret_provider_plugin", "params") }}'
    pg_hostname: 'your_postgres_server'
    project_data_dir: "      - \"~/.awx/awxcompose/settings.py\":\"/etc/tower/settings.py\"\n      - \"{{ your_desired_project_data_dir }}"
    (...)

@fitbeard
Copy link

fitbeard commented May 4, 2020

Hi everyone. How about this: https://github.com/fitbeard/awx-ha-cluster
I borrowed ideas from this repo long time ago and now i’m running 11+ ha cluster in prod for some time. Still using generated static uuid instead of fqdn. Everything tested and working very well.

@RylandDeGregory
Copy link

Hi everyone. How about this: https://github.com/fitbeard/awx-ha-cluster
I borrowed ideas from this repo long time ago and now i’m running 11+ ha cluster in prod for some time. Still using generated static uuid instead of fqdn. Everything tested and working very well.

I've used a lot of your functionality in making my own AWX 11+ installer, particularly your use of a Key Vault lookup plugin! (You use Hashicorp, I use AKV, but same idea!) I'd been wrestling for a while with how to use the inventory file securely...but avoiding it completely is entirely nicer.

@bryanasdev000
Copy link

Hi everyone. How about this: https://github.com/fitbeard/awx-ha-cluster
I borrowed ideas from this repo long time ago and now i’m running 11+ ha cluster in prod for some time. Still using generated static uuid instead of fqdn. Everything tested and working very well.

Thanks for sharing it! Gonna test ASAP.

@chris93111
Copy link
Author

chris93111 commented Jun 4, 2020

Hi @ryanpetrello

Do you know why the web node try to contact all worker node in cluster in same port of web node ?

The worker is not exposed with docker

AWX version 11.2

2020-06-04 15:36:04,890 DEBUG awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx02-worker attempt number 10.
2020-06-04 15:36:04,892 WARNING awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx01-worker failed: 'Cannot connect to host vldvaawx01-worker:443 ssl:False [Connect call failed ('xxxxxxxx', 443)]'.
2020-06-04 15:36:04,893 DEBUG awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx01-worker attempt number 10.
2020-06-04 15:36:09,897 WARNING awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx02-worker failed: 'Cannot connect to host vldvaawx02-worker:443 ssl:False [Connect call failed ('xxxxxxxx', 443)]'.
2020-06-04 15:36:09,898 DEBUG awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx02-worker attempt number 11.
2020-06-04 15:36:09,900 WARNING awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx01-worker failed: 'Cannot connect to host vldvaawx01-worker:443 ssl:False [Connect call failed ('xxxxxxxxx', 443)]'.
2020-06-04 15:36:09,901 DEBUG awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx01-worker attempt number 11.

thanks

@ryanpetrello
Copy link

ryanpetrello commented Jun 4, 2020

@chris93111,

Yep. Starting with the removal of RabbitMQ, when a playbook runs on a certain node, the stdout events are broadcast to all other cluster nodes via websockets/ASGI over port 443.

This is how you can run a playbook on Node A, but view the streaming stdout results on Node B. Previously, our RabbitMQ clustering had a similar model which required shared network activity amongst the nodes, and this new ASGI traffic in the redis implementation is the analog to that behavior, so a requirement for this behavior to work in any clustered AWX installs is that each node/instance is routable to each other instance via some address on port 443.

from ansible/awx#5443:

When an event is persisted to the database by the callback receiver, it also is broadcasted to all cluster peers via ASGI. In this way, if a playbook runs on Node A, users connected to Daphne on Nodes B, C, and D will receive a broadcast of these events and see the output in their browser tabs.

@chris93111
Copy link
Author

@ryanpetrello thanks for you response , ok i inderstand now ! but in this config the task can't run without web in node ? all node must contain worker and web composant right ?

@ryanpetrello
Copy link

@chris93111 With the way AWX works today, that's correct.

@ryanpetrello
Copy link

ryanpetrello commented Jun 4, 2020

@sujiar37 @chris93111 since we're here chatting, another thing you may care to know about this - though I don't think it strictly affects clustering - is that in the near future, we're considering removing memcached entirely (because redis sort of serves the same purpose). I don't anticipate any notable changes for you all downstream aside from "point Django's caching at redis instead of memcached"

Details here:

ansible/awx#6932
ansible/awx#7240

This is likely coming in the next major version of AWX in the coming weeks (12.0.0).

@chris93111
Copy link
Author

@ryanpetrello yes i have read this in google ansible project :)

Thanks for this information ;)

@sujiar37
Copy link
Owner

sujiar37 commented Aug 4, 2020

@fitbeard This is great, thank you so much for your work and helping the community with the recent updates. Unfortunately due to other engagements, I couldn't look deeper and follow up after the version 10.0 with the introduction of redis clustering.

@ryanpetrello Great to see you here, Appreciate your comments and thank you for those information's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants