Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure in upgrade_ha script from undefined HA state check #202

Open
honvl opened this issue Dec 1, 2023 · 0 comments
Open

failure in upgrade_ha script from undefined HA state check #202

honvl opened this issue Dec 1, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@honvl
Copy link

honvl commented Dec 1, 2023

Describe the bug

Upgrade_ha script failure when upgrading PA3220 pair from 9.1 to 10.0 due to undefined stdout on HA state check

Expected behavior

The HA state task should repeat until the stdout of the state sync check is correct, and ignore undefined stdout

Current behavior

TASK [Install target PAN-OS version and restart (primary)] *********************
task path: /tmp/awx_6394_nqqpuos9/project/upgrade_ha.yml:103
changed: [fw_px_pa3220-1_mgt.ad.dpw.com] => {"attempts": 1, "changed": true, "version": "10.0.11-h1"}
TASK [Pause for restart] *******************************************************
task path: /tmp/awx_6394_nqqpuos9/project/upgrade_ha.yml:113
Pausing for 30 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [fw_px_pa3220-1_mgt.ad.dpw.com] => {"changed": false, "delta": 30, "echo": true, "rc": 0, "start": "2023-12-01 12:19:01.659550", "stderr": "", "stdout": "Paused for 30.0 seconds", "stop": "2023-12-01 12:19:31.659867", "user_input": ""}
TASK [Chassis ready (primary)] *************************************************
task path: /tmp/awx_6394_nqqpuos9/project/upgrade_ha.yml:117
FAILED - RETRYING: Chassis ready (primary) (29 retries left).
FAILED - RETRYING: Chassis ready (primary) (28 retries left).
FAILED - RETRYING: Chassis ready (primary) (27 retries left).
FAILED - RETRYING: Chassis ready (primary) (26 retries left).
FAILED - RETRYING: Chassis ready (primary) (25 retries left).
FAILED - RETRYING: Chassis ready (primary) (24 retries left).
ok: [fw_px_pa3220-1_mgt.ad.dpw.com] => {"attempts": 8, "changed": false, "msg": "Done", "stdout": "{"response": {"@status": "success", "result": "yes"}}", "stdout_lines": ["{"response": {"@status": "success", "result": "yes"}}"], "stdout_xml": "<response status="success">yes\n"}
TASK [State sync check (primary)] **********************************************
task path: /tmp/awx_6394_nqqpuos9/project/upgrade_ha.yml:127
fatal: [fw_px_pa3220-1_mgt.ad.dpw.com]: FAILED! => {"msg": "The conditional check '( primary_state_sync.stdout | from_json).response.result.group["local-info"].state == 'passive' and ( primary_state_sync.stdout | from_json).response.result.group["local-info"]["state-sync"] == 'Complete'' failed. The error was: Unexpected templating type error occurred on ({% if ( primary_state_sync.stdout | from_json).response.result.group["local-info"].state == 'passive' and ( primary_state_sync.stdout | from_json).response.result.group["local-info"]["state-sync"] == 'Complete' %} True {% else %} False {% endif %}): expected string or buffer"}

Possible solution

The chassis-ready task may be insufficient to represent that the firewall is ready to provide output for the HA state task, especially on major version upgrades (9.1 -> 10.0), or if the timing is bad (too early after the chassis becomes ready)

Check for defined stdout before parsing the JSON from it

    - name: State sync check (primary) not empty
      paloaltonetworks.panos.panos_op:
        provider: '{{ primary }}'
        cmd: 'show high-availability state'
      register: primary_state_sync
      retries: 10
      delay: 30
      until: primary_state_sync.stdout is defined

Or find a way to ignore undefined stdout and continue with the HA state task's until loop
ChatGPT has suggested:

    - name: State sync check (primary)
      paloaltonetworks.panos.panos_op:
        provider: '{{ primary }}'
        cmd: 'show high-availability state'
      register: primary_state_sync
      retries: 10
      delay: 30
      until: >
        primary_state_sync.stdout is defined and
        (
          (primary_state_sync.stdout | from_json).response.result.group["local-info"].state == 'passive' and
          (primary_state_sync.stdout | from_json).response.result.group["local-info"]["state-sync"] == 'Complete'
        )

Steps to reproduce

Upgrade a pair of PA3220 from 9.1.14-h4 to 10.0.11-h1
Issue did not occur when further upgrading the pair from 10.0.11-h1 to 10.1.10-h2

Context

Broke a planned automated upgrade and required manual script editing and user intervention to resume at the previous step
The state sync check did work after the playbook was rerun at the failed step

Environment

- name: State sync check (primary)
  paloaltonetworks.panos.panos_op:
    provider: '{{ primary }}'
    cmd: 'show high-availability state'
  register: primary_state_sync
  retries: 10
  delay: 30
  until: ( primary_state_sync.stdout | from_json).response.result.group["local-info"].state == 'passive' and
         ( primary_state_sync.stdout | from_json).response.result.group["local-info"]["state-sync"] == 'Complete'

Ansible version:
ansible-playbook 2.9.27
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/var/lib/awx/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 2.7.5 (default, May 30 2023, 03:38:55) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]

@honvl honvl added the bug Something isn't working label Dec 1, 2023
@honvl honvl changed the title failure in upgrade_ha script failure in upgrade_ha script from undefined HA state check Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant