-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implenemt playbook to recover vmhost server automatically #2800
Conversation
This commit introduced several playbook to recover vmhost server automatically. Signed-off-by: bingwang <[email protected]>
Signed-off-by: bingwang <[email protected]>
00d562b
to
c4b7e5a
Compare
Signed-off-by: bingwang <[email protected]>
src_disk_image: "{{ home_path }}/{{ root_path }}/images/{{ hdd_image_filename }}" | ||
disk_image: "{{ home_path }}/{{ root_path }}/disks/{{ vm_name }}_hdd.vmdk" | ||
cdrom_image: "{{ home_path }}/{{ root_path }}/images/{{ cd_image_filename }}" | ||
when: '"kickstart_code" in kickstart_output and kickstart_output.kickstart_code != 0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since almost all tasks in respin_vm.yml
needs escalated privilege, it will be nicer to apply become
like:
- name: Respin failed vm
include_tasks: respin_vm.yml
vars:
src_disk_image: "{{ home_path }}/{{ root_path }}/images/{{ hdd_image_filename }}"
disk_image: "{{ home_path }}/{{ root_path }}/disks/{{ vm_name }}_hdd.vmdk"
cdrom_image: "{{ home_path }}/{{ root_path }}/images/{{ cd_image_filename }}"
apply:
become: True
when: '"kickstart_code" in kickstart_output and kickstart_output.kickstart_code != 0'
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/include_tasks_module.html
# This playbook will cleanup a vm_host, including removing all veos, containers and net bridges. | ||
|
||
- hosts: servers:&vm_host | ||
gather_facts: no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above to put become
here.
11bad96
to
9f6186b
Compare
set_fact: | ||
kickstart_failed_vms: "{{ kickstart_failed_vms + [vm_name] }}" | ||
when: '"kickstart_code" in kickstart_output_final and kickstart_output_final.kickstart_code != 0' | ||
when: '"kickstart_code" in kickstart_output and kickstart_output.kickstart_code != 0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to retry respine if one round of respine still failed? To avoid endless retry, we can add a max retry limitation, like 3 times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I'll make this change in next PR.
Signed-off-by: bingwang [email protected]
Description of PR
Summary:
Fixes # (issue)
This commit introduced several playbook to recover vmhost server automatically.
It's extremely time consuming to redeploy all testbeds on a host server if the server is down or rebooted. This PR adds a new option in
testbed-cli.sh
to do a cleanup of host server, and adds a respin of vm that failed to start.Type of change
Approach
What is the motivation for this PR?
This PR is to add new playbooks to support auto testbed recovery.
How did you do it?
How did you verify/test it?
Verified in starlab.
Any platform specific information?
No.
Supported testbed topology if it's a new test case?
No.
Documentation