Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge behavior when calling cephadm-purge-cluster.yml with --limit #170

Closed
tobiasmcnulty opened this issue Sep 25, 2022 · 3 comments · Fixed by #259
Closed

Purge behavior when calling cephadm-purge-cluster.yml with --limit #170

tobiasmcnulty opened this issue Sep 25, 2022 · 3 comments · Fixed by #259

Comments

@tobiasmcnulty
Copy link
Contributor

tobiasmcnulty commented Sep 25, 2022

The cephadm-purge-cluster.yml pre-purge checks fail in a not-so-good way when one calls the playbook with a --limit to a subset of the hosts in the inventory (e.g., those hosts that are in the cluster).

As an alternative, the plays could have hosts: all set and the following added to each task to achieve the same behavior in a compatible way:

      delegate_to: localhost
      run_once: true

Otherwise, once sees skipping: no hosts matched on the applicable plays. The "confirm whether user really wants to purge the cluster" play even prompts the user to see if they want to purge the cluster, but continues to purge the cluster no matter what they type.

If this sounds like an acceptable solution I can make a PR.

@guits
Copy link
Collaborator

guits commented Sep 27, 2022

that shouldn't be an issue [1].

By the way, this is what we do in the CI. If you look at the inventory we use here [2], you can see we don't define a host localhost, and we use it here [3].

See a job in the CI :

el8-functional run-test: commands[10] | ansible-playbook -vv -i /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/tests/functional/hosts /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml -e ireallymeanit=yes -e fsid=4217f198-b8b7-11eb-941d-5254004b7a69
[1517415] /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/tests/functional$ /tmp/venv.sU96b3Nguc/el8-functional/bin/ansible-playbook -vv -i hosts /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml -e ireallymeanit=yes -e fsid=4217f198-b8b7-11eb-941d-5254004b7a69
ansible-playbook 2.9.27
  config file = /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/ansible.cfg
  configured module search path = ['/home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/library']
  ansible python module location = /tmp/venv.sU96b3Nguc/el8-functional/lib/python3.9/site-packages/ansible
  executable location = /tmp/venv.sU96b3Nguc/el8-functional/bin/ansible-playbook
  python version = 3.9.7 (default, Sep 21 2021, 00:13:39) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)]
Using /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/ansible.cfg as config file
Skipping callback 'actionable', as we already have a stdout callback.
Skipping callback 'counter_enabled', as we already have a stdout callback.
Skipping callback 'debug', as we already have a stdout callback.
Skipping callback 'dense', as we already have a stdout callback.
Skipping callback 'dense', as we already have a stdout callback.
Skipping callback 'full_skip', as we already have a stdout callback.
Skipping callback 'json', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'null', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
Skipping callback 'selective', as we already have a stdout callback.
Skipping callback 'skippy', as we already have a stdout callback.
Skipping callback 'stderr', as we already have a stdout callback.
Skipping callback 'unixy', as we already have a stdout callback.
Skipping callback 'yaml', as we already have a stdout callback.

PLAYBOOK: cephadm-purge-cluster.yml ********************************************
7 plays in /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml

PLAY [check local prerequisites are in place] **********************************
META: ran handlers

TASK [fail if fsid was not provided] *******************************************
task path: /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml:22
Friday 23 September 2022  09:00:29 +0000 (0:00:00.015)       0:00:00.015 ****** 
skipping: [localhost] => changed=false 
  skip_reason: Conditional result was False

TASK [fail if admin group doesn't exist or is empty] ***************************
task path: /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml:29
Friday 23 September 2022  09:00:29 +0000 (0:00:00.023)       0:00:00.038 ****** 
skipping: [localhost] => changed=false 
  skip_reason: Conditional result was False

the issue with hosts: all is when users use the same inventory for multiple environments.

[1] https://docs.ansible.com/ansible/latest/inventory/implicit_localhost.html#implicit-localhost
[2] https://github.com/ceph/cephadm-ansible/blob/devel/tests/functional/hosts
[3] https://github.com/ceph/cephadm-ansible/blob/v2.9.0/tox.ini#L90
[4] https://2.jenkins.ceph.com/job/cephadm-ansible-prs-el8-functional/168/consoleFull

@tobiasmcnulty
Copy link
Contributor Author

@guits Thanks for the detailed reply. I did some additional testing and the issue may be my fault.

I am using a single inventory file for ceph servers and other servers that are not part of the cluster. I am not sure if that is the same as "the issue with hosts: all is when users use the same inventory for multiple environments." or if that is a different issue.

When deleting the cluster I tried to limit it to my ceph group:

ansible-playbook -i inventory -l ceph cephadm-ansible/cephadm-purge-cluster.yml -e fsid=bcf88dfa-3cf0-11ed-b5a4-b19cff2a0492

This has the unfortunate side effect of skipping any plays that are limited to localhost (edited with a debug message so as not to really delete my cluster):

$ ansible-playbook -i inventory -l ceph cephadm-ansible/cephadm-purge-cluster.yml -e fsid=bcf88dfa-3cf0-11ed-b5a4-b19cff2a0492

PLAY [check local prerequisites are in place] *************************************************************************************************
skipping: no hosts matched

PLAY [check keyring is present on the admin host] *********************************************************************************************

TASK [check /etc/ceph/ceph.admin.client.keyring] **********************************************************************************************
ok: [loyal_mouse]

TASK [fail if /etc/ceph/admin.client.keyring is not present] **********************************************************************************
skipping: [loyal_mouse]

PLAY [check cluster hosts have cephadm and the required fsid bcf88dfa-3cf0-11ed-b5a4-b19cff2a0492] ********************************************

TASK [check cephadm binary is available] ******************************************************************************************************
ok: [loyal_mouse]
ok: [tender_poodle]
ok: [tidy_piglet]

TASK [fail if cephadm is not available] *******************************************************************************************************
skipping: [loyal_mouse]
skipping: [tender_poodle]
skipping: [tidy_piglet]

TASK [check fsid directory given is valid across the cluster] *********************************************************************************
ok: [tender_poodle]
ok: [tidy_piglet]
ok: [loyal_mouse]

TASK [fail if the fsid directory is missing] **************************************************************************************************
skipping: [loyal_mouse]
skipping: [tender_poodle]
skipping: [tidy_piglet]

Are you sure you want to purge the cluster with fsid=bcf88dfa-3cf0-11ed-b5a4-b19cff2a0492 ?
 [no]: no

PLAY [confirm whether user really wants to purge the cluster] *********************************************************************************
skipping: no hosts matched

PLAY [debug message] **************************************************************************************************************************

TASK [debug] **********************************************************************************************************************************
ok: [loyal_mouse] => {
    "msg": "continuing to purge the cluster anyways..."
}
ok: [tender_poodle] => {
    "msg": "continuing to purge the cluster anyways..."
}
ok: [tidy_piglet] => {
    "msg": "continuing to purge the cluster anyways..."
}

PLAY RECAP ************************************************************************************************************************************
loyal_mouse                : ok=4    changed=0    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0   
tender_poodle              : ok=3    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   
tidy_piglet                : ok=3    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0  

Without the -l ceph, it would fail properly if I type "no" at the prompt, but I don't get that far because it dies on the fatal: [non_ceph_host]: FAILED! => {"changed": false, "msg": "The cephadm binary is missing on non_ceph_host. To purge the cluster you must have cephadm installed\non ALL ceph hosts. Install manually or use the preflight playbook.\n"} message.

If the goal is to only support an inventory file dedicated to the ceph cluster I understand, and am happy to close this out / find a workaround on our side.

@tobiasmcnulty tobiasmcnulty changed the title Purge behavior when localhost host is not defined Purge behavior when calling cephadm-purge-cluster.yml with --limit Sep 27, 2022
@guits
Copy link
Collaborator

guits commented Sep 28, 2022

After a better look at the playbook, what I wanted to avoid with hosts: localhost is actually counter-productive when using a 'shared inventory' with --limit. By the way can see hosts: all is used for multiple plays:

https://github.com/ceph/cephadm-ansible/blob/devel/cephadm-purge-cluster.yml#L56
https://github.com/ceph/cephadm-ansible/blob/devel/cephadm-purge-cluster.yml#L132
https://github.com/ceph/cephadm-ansible/blob/devel/cephadm-purge-cluster.yml#L147

I think the approach you suggested makes sense in the end.

If you want to send a PR... 🙂

asm0deuz added a commit to asm0deuz/cephadm-ansible that referenced this issue Nov 24, 2023
As some users have inventory files with not just ceph hosts, they
uses --limit to target only ceph nodes.
Using "hosts: localhost" in playbooks prevent those tasks being
executed. Using delegate_to with run_once fixes the issue.

Fixes: ceph#170

Signed-off-by: Teoman ONAY <[email protected]>
asm0deuz added a commit to asm0deuz/cephadm-ansible that referenced this issue Nov 24, 2023
As some users have inventory files with not just ceph hosts, they
uses --limit to target only ceph nodes.
Using "hosts: localhost" in playbooks prevent those tasks being
executed. Using delegate_to with run_once fixes the issue.

Fixes: ceph#170

Signed-off-by: Teoman ONAY <[email protected]>
asm0deuz added a commit to asm0deuz/cephadm-ansible that referenced this issue Nov 24, 2023
As some users have inventory files with not just ceph hosts, they
uses --limit to target only ceph nodes.
Using "hosts: localhost" in playbooks prevent those tasks being
executed. Using delegate_to with run_once fixes the issue.

Fixes: ceph#170

Signed-off-by: Teoman ONAY <[email protected]>
asm0deuz added a commit to asm0deuz/cephadm-ansible that referenced this issue Nov 24, 2023
As some users have inventory files with not just ceph hosts, they
uses --limit to target only ceph nodes.
Using "hosts: localhost" in playbooks prevent those tasks being
executed. Using delegate_to with run_once fixes the issue.

Fixes: ceph#170

Signed-off-by: Teoman ONAY <[email protected]>
asm0deuz added a commit to asm0deuz/cephadm-ansible that referenced this issue Nov 24, 2023
As some users have inventory files with not just ceph hosts, they
uses --limit to target only ceph nodes.
Using "hosts: localhost" in playbooks prevent those tasks being
executed. Using delegate_to with run_once fixes the issue.

Fixes: ceph#170

Signed-off-by: Teoman ONAY <[email protected]>
mergify bot pushed a commit that referenced this issue Nov 24, 2023
As some users have inventory files with not just ceph hosts, they
uses --limit to target only ceph nodes.
Using "hosts: localhost" in playbooks prevent those tasks being
executed. Using delegate_to with run_once fixes the issue.

Fixes: #170

Signed-off-by: Teoman ONAY <[email protected]>
(cherry picked from commit dd49ecc)
mergify bot pushed a commit that referenced this issue Nov 24, 2023
As some users have inventory files with not just ceph hosts, they
uses --limit to target only ceph nodes.
Using "hosts: localhost" in playbooks prevent those tasks being
executed. Using delegate_to with run_once fixes the issue.

Fixes: #170

Signed-off-by: Teoman ONAY <[email protected]>
(cherry picked from commit dd49ecc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants