Purge behavior when calling `cephadm-purge-cluster.yml` with `--limit` #170

tobiasmcnulty · 2022-09-25T16:22:31Z

The cephadm-purge-cluster.yml pre-purge checks fail in a not-so-good way when one calls the playbook with a --limit to a subset of the hosts in the inventory (e.g., those hosts that are in the cluster).

As an alternative, the plays could have hosts: all set and the following added to each task to achieve the same behavior in a compatible way:

      delegate_to: localhost
      run_once: true

Otherwise, once sees skipping: no hosts matched on the applicable plays. The "confirm whether user really wants to purge the cluster" play even prompts the user to see if they want to purge the cluster, but continues to purge the cluster no matter what they type.

If this sounds like an acceptable solution I can make a PR.

The text was updated successfully, but these errors were encountered:

guits · 2022-09-27T08:35:28Z

that shouldn't be an issue [1].

By the way, this is what we do in the CI. If you look at the inventory we use here [2], you can see we don't define a host localhost, and we use it here [3].

See a job in the CI :

el8-functional run-test: commands[10] | ansible-playbook -vv -i /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/tests/functional/hosts /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml -e ireallymeanit=yes -e fsid=4217f198-b8b7-11eb-941d-5254004b7a69
[1517415] /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/tests/functional$ /tmp/venv.sU96b3Nguc/el8-functional/bin/ansible-playbook -vv -i hosts /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml -e ireallymeanit=yes -e fsid=4217f198-b8b7-11eb-941d-5254004b7a69
ansible-playbook 2.9.27
  config file = /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/ansible.cfg
  configured module search path = ['/home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/library']
  ansible python module location = /tmp/venv.sU96b3Nguc/el8-functional/lib/python3.9/site-packages/ansible
  executable location = /tmp/venv.sU96b3Nguc/el8-functional/bin/ansible-playbook
  python version = 3.9.7 (default, Sep 21 2021, 00:13:39) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)]
Using /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/ansible.cfg as config file
Skipping callback 'actionable', as we already have a stdout callback.
Skipping callback 'counter_enabled', as we already have a stdout callback.
Skipping callback 'debug', as we already have a stdout callback.
Skipping callback 'dense', as we already have a stdout callback.
Skipping callback 'dense', as we already have a stdout callback.
Skipping callback 'full_skip', as we already have a stdout callback.
Skipping callback 'json', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'null', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
Skipping callback 'selective', as we already have a stdout callback.
Skipping callback 'skippy', as we already have a stdout callback.
Skipping callback 'stderr', as we already have a stdout callback.
Skipping callback 'unixy', as we already have a stdout callback.
Skipping callback 'yaml', as we already have a stdout callback.

PLAYBOOK: cephadm-purge-cluster.yml ********************************************
7 plays in /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml

PLAY [check local prerequisites are in place] **********************************
META: ran handlers

TASK [fail if fsid was not provided] *******************************************
task path: /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml:22
Friday 23 September 2022  09:00:29 +0000 (0:00:00.015)       0:00:00.015 ****** 
skipping: [localhost] => changed=false 
  skip_reason: Conditional result was False

TASK [fail if admin group doesn't exist or is empty] ***************************
task path: /home/jenkins-build/build/workspace/cephadm-ansible-prs-el8-functional/cephadm-purge-cluster.yml:29
Friday 23 September 2022  09:00:29 +0000 (0:00:00.023)       0:00:00.038 ****** 
skipping: [localhost] => changed=false 
  skip_reason: Conditional result was False

the issue with hosts: all is when users use the same inventory for multiple environments.

[1] https://docs.ansible.com/ansible/latest/inventory/implicit_localhost.html#implicit-localhost
[2] https://github.com/ceph/cephadm-ansible/blob/devel/tests/functional/hosts
[3] https://github.com/ceph/cephadm-ansible/blob/v2.9.0/tox.ini#L90
[4] https://2.jenkins.ceph.com/job/cephadm-ansible-prs-el8-functional/168/consoleFull

tobiasmcnulty · 2022-09-27T12:45:16Z

@guits Thanks for the detailed reply. I did some additional testing and the issue may be my fault.

I am using a single inventory file for ceph servers and other servers that are not part of the cluster. I am not sure if that is the same as "the issue with hosts: all is when users use the same inventory for multiple environments." or if that is a different issue.

When deleting the cluster I tried to limit it to my ceph group:

ansible-playbook -i inventory -l ceph cephadm-ansible/cephadm-purge-cluster.yml -e fsid=bcf88dfa-3cf0-11ed-b5a4-b19cff2a0492

This has the unfortunate side effect of skipping any plays that are limited to localhost (edited with a debug message so as not to really delete my cluster):

$ ansible-playbook -i inventory -l ceph cephadm-ansible/cephadm-purge-cluster.yml -e fsid=bcf88dfa-3cf0-11ed-b5a4-b19cff2a0492

PLAY [check local prerequisites are in place] *************************************************************************************************
skipping: no hosts matched

PLAY [check keyring is present on the admin host] *********************************************************************************************

TASK [check /etc/ceph/ceph.admin.client.keyring] **********************************************************************************************
ok: [loyal_mouse]

TASK [fail if /etc/ceph/admin.client.keyring is not present] **********************************************************************************
skipping: [loyal_mouse]

PLAY [check cluster hosts have cephadm and the required fsid bcf88dfa-3cf0-11ed-b5a4-b19cff2a0492] ********************************************

TASK [check cephadm binary is available] ******************************************************************************************************
ok: [loyal_mouse]
ok: [tender_poodle]
ok: [tidy_piglet]

TASK [fail if cephadm is not available] *******************************************************************************************************
skipping: [loyal_mouse]
skipping: [tender_poodle]
skipping: [tidy_piglet]

TASK [check fsid directory given is valid across the cluster] *********************************************************************************
ok: [tender_poodle]
ok: [tidy_piglet]
ok: [loyal_mouse]

TASK [fail if the fsid directory is missing] **************************************************************************************************
skipping: [loyal_mouse]
skipping: [tender_poodle]
skipping: [tidy_piglet]

Are you sure you want to purge the cluster with fsid=bcf88dfa-3cf0-11ed-b5a4-b19cff2a0492 ?
 [no]: no

PLAY [confirm whether user really wants to purge the cluster] *********************************************************************************
skipping: no hosts matched

PLAY [debug message] **************************************************************************************************************************

TASK [debug] **********************************************************************************************************************************
ok: [loyal_mouse] => {
    "msg": "continuing to purge the cluster anyways..."
}
ok: [tender_poodle] => {
    "msg": "continuing to purge the cluster anyways..."
}
ok: [tidy_piglet] => {
    "msg": "continuing to purge the cluster anyways..."
}

PLAY RECAP ************************************************************************************************************************************
loyal_mouse                : ok=4    changed=0    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0   
tender_poodle              : ok=3    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   
tidy_piglet                : ok=3    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0

Without the -l ceph, it would fail properly if I type "no" at the prompt, but I don't get that far because it dies on the fatal: [non_ceph_host]: FAILED! => {"changed": false, "msg": "The cephadm binary is missing on non_ceph_host. To purge the cluster you must have cephadm installed\non ALL ceph hosts. Install manually or use the preflight playbook.\n"} message.

If the goal is to only support an inventory file dedicated to the ceph cluster I understand, and am happy to close this out / find a workaround on our side.

guits · 2022-09-28T07:05:46Z

After a better look at the playbook, what I wanted to avoid with hosts: localhost is actually counter-productive when using a 'shared inventory' with --limit. By the way can see hosts: all is used for multiple plays:

https://github.com/ceph/cephadm-ansible/blob/devel/cephadm-purge-cluster.yml#L56
https://github.com/ceph/cephadm-ansible/blob/devel/cephadm-purge-cluster.yml#L132
https://github.com/ceph/cephadm-ansible/blob/devel/cephadm-purge-cluster.yml#L147

I think the approach you suggested makes sense in the end.

If you want to send a PR... 🙂

As some users have inventory files with not just ceph hosts, they uses --limit to target only ceph nodes. Using "hosts: localhost" in playbooks prevent those tasks being executed. Using delegate_to with run_once fixes the issue. Fixes: ceph#170 Signed-off-by: Teoman ONAY <[email protected]>

As some users have inventory files with not just ceph hosts, they uses --limit to target only ceph nodes. Using "hosts: localhost" in playbooks prevent those tasks being executed. Using delegate_to with run_once fixes the issue. Fixes: #170 Signed-off-by: Teoman ONAY <[email protected]> (cherry picked from commit dd49ecc)

tobiasmcnulty changed the title ~~Purge behavior when localhost host is not defined~~ Purge behavior when calling cephadm-purge-cluster.yml with --limit Sep 27, 2022

tobiasmcnulty mentioned this issue Sep 28, 2022

switch to "delegate_to: localhost" for purge playbook #173

Closed

asm0deuz mentioned this issue Nov 24, 2023

Replace hosts: localhost by delegate_to: localhost #259

Merged

asm0deuz closed this as completed in #259 Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Purge behavior when calling `cephadm-purge-cluster.yml` with `--limit` #170

Purge behavior when calling `cephadm-purge-cluster.yml` with `--limit` #170

tobiasmcnulty commented Sep 25, 2022 •

edited

Loading

guits commented Sep 27, 2022

tobiasmcnulty commented Sep 27, 2022

guits commented Sep 28, 2022

Purge behavior when calling cephadm-purge-cluster.yml with --limit #170

Purge behavior when calling cephadm-purge-cluster.yml with --limit #170

Comments

tobiasmcnulty commented Sep 25, 2022 • edited Loading

guits commented Sep 27, 2022

tobiasmcnulty commented Sep 27, 2022

guits commented Sep 28, 2022

Purge behavior when calling `cephadm-purge-cluster.yml` with `--limit` #170

Purge behavior when calling `cephadm-purge-cluster.yml` with `--limit` #170

tobiasmcnulty commented Sep 25, 2022 •

edited

Loading