Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid deployments #563

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ Make sure to set/review the following vars:
| `lab_cloud` | the cloud within the lab environment for Red Hat Performance labs (Example: `cloud42`)
| `cluster_type` | either `mno`, or `sno` for the respective cluster layout
| `worker_node_count` | applies to mno cluster type for the desired worker count, ideal for leaving left over inventory hosts for other purposes
| `hybrid_worker_count` | applies to mno cluster type for the desired virtual worker count, HV nodes and VMs are required to be setup.
| `bastion_lab_interface` | set to the bastion machine's lab accessible interface
| `bastion_controlplane_interface` | set to the interface in which the bastion will be networked to the deployed ocp cluster
| `controlplane_lab_interface` | applies to mno cluster type and should map to the nodes interface in which the lab provides dhcp to and also required for public routable vlan based sno deployment(to disable this interface)
Expand Down
4 changes: 4 additions & 0 deletions ansible/mno-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@
vars:
inventory_group: worker
index: "{{ worker_node_count }}"
- role: boot-iso
vars:
inventory_group: hv_vm
index: "{{ hybrid_worker_count }}"
Comment on lines +27 to +30
Copy link
Member

@akrzos akrzos Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of someone running an ACM scale test where all of the hv_vm entries are actually say SNOs, does this task dump thousands of lines of skipped tasks or does it just skip the role? If it dumps thousands of lines, I think we should revisit how this is performed perhaps using a different inventory group.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right it does dump thousands of lines. I had looked at adding a loop_var in this at one point to make the output more meaningful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we could instead of including another boot-iso role here over hv_vm workers, maybe we just copy the desired number of hv_vm we want to use under workers instead. I will think of a more automated way to accomplish this as well. WDYT?

- wait-hosts-discovered
- configure-local-storage
- install-cluster
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ ASSISTED_SERVICE_HOST={{ assisted_installer_host }}:{{ assisted_installer_port }
IMAGE_SERVICE_BASE_URL=http://{{ assisted_installer_host }}:{{ assisted_image_service_port }}
LISTEN_PORT={{ assisted_image_service_port }}
DEPLOY_TARGET=onprem
DEPLOY_TYPE="Podman"
STORAGE=filesystem
DUMMY_IGNITION=false

Expand Down
90 changes: 90 additions & 0 deletions ansible/roles/boot-iso/tasks/libvirt.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
# Libvirt tasks for booting an iso
# Couldn't use ansible redfish_command it requires username and password to be used.
# URLs modeled from http://docs.openstack.org/sushy-tools/latest/user/dynamic-emulator.html

- name: Libvirt - Power down machine prior to booting iso
uri:
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Systems/{{ hostvars[item]['domain_uuid'] }}/Actions/ComputerSystem.Reset"
method: POST
headers:
content-type: application/json
Accept: application/json
body: {"ResetType":"ForceOff"}
body_format: json
validate_certs: no
status_code: 204
return_content: yes
register: redfish_forceoff

- name: Libvirt - Pause for power down
pause:
seconds: 1
when: not redfish_forceoff.failed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be able to use a "check for powered down" type of task here instead of a sleep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove this entirely on my tested deployments (3, 27, and 54 VMs)


- name: Libvirt - Set OneTimeBoot VirtualCD
uri:
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Systems/{{ hostvars[item]['domain_uuid'] }}"
method: PATCH
headers:
content-type: application/json
Accept: application/json
body: { "Boot": { "BootSourceOverrideTarget": "Cd", "BootSourceOverrideMode": "UEFI", "BootSourceOverrideEnabled": "Continuous" } }
body_format: json
validate_certs: no
status_code: 204
return_content: yes

- name: Libvirt - Check for Virtual Media
uri:
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Managers/{{ hostvars[item]['domain_uuid'] }}/VirtualMedia/Cd"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change Managers to Systems because Managers didn't work 🤣

method: Get
headers:
content-type: application/json
Accept: application/json
body: {}
body_format: json
validate_certs: no
status_code: 200
return_content: yes
register: check_virtual_media

- name: Libvirt - Eject any CD Virtual Media
uri:
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Managers/{{ hostvars[item]['domain_uuid'] }}/VirtualMedia/Cd/Actions/VirtualMedia.EjectMedia"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with Managers

method: POST
headers:
content-type: application/json
Accept: application/json
body: {}
body_format: json
validate_certs: no
status_code: 204
return_content: yes
when: check_virtual_media.json.Image

- name: Libvirt - Insert virtual media
uri:
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Managers/{{ hostvars[item]['domain_uuid'] }}/VirtualMedia/Cd/Actions/VirtualMedia.InsertMedia"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with Managers

method: POST
headers:
content-type: application/json
Accept: application/json
body: {"Image":"http://{{ http_store_host }}:{{ http_store_port }}/{{ hostvars[item]['boot_iso'] }}", "Inserted": true}
body_format: json
validate_certs: no
status_code: 204
return_content: yes

- name: Libvirt - Power on
uri:
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Systems/{{ hostvars[item]['domain_uuid'] }}/Actions/ComputerSystem.Reset"
method: POST
headers:
content-type: application/json
Accept: application/json
body: {"ResetType":"On"}
body_format: json
validate_certs: no
status_code: 204
return_content: yes
6 changes: 6 additions & 0 deletions ansible/roles/boot-iso/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,9 @@
with_items:
- "{{ groups[inventory_group][:index|int] }}"
when: hostvars[item]['vendor'] == 'Lenovo'

- name: Boot iso on libvirt vm
include_tasks: libvirt.yml
with_items:
- "{{ groups[inventory_group][:index|int] }}"
when: hostvars[item]['vendor'] == 'Libvirt'
12 changes: 11 additions & 1 deletion ansible/roles/create-ai-cluster/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,13 @@
- cluster_type == "mno"
loop: "{{ groups['worker'] }}"

- name: MNO / Hybrid (VM Workers) - Populate static network configuration with VM worker nodes
include_tasks: static_network_config.yml
when:
- cluster_type == "mno"
- hybrid_worker_count > 0
loop: "{{ groups['hv_vm'][:hybrid_worker_count] }}"

# - debug:
# msg: "{{ static_network_config }}"

Expand All @@ -52,7 +59,10 @@
"pull_secret": "{{ pull_secret | to_json }}",
"ssh_public_key": "{{ lookup('file', ssh_public_key_file) }}",
"vip_dhcp_allocation": "{{ vip_dhcp_allocation }}",
"additional_ntp_source": "{{ bastion_controlplane_ip if use_bastion_registry else labs[lab]['ntp_server'] }}"
"additional_ntp_source": "{{ bastion_controlplane_ip if use_bastion_registry else labs[lab]['ntp_server'] }}",
"api_vips": [{"ip": "{{ controlplane_network_api }}"}],
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}],
"network_type": "{{ networktype }}"
Comment on lines +62 to +65
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to revert this.

}
register: create_cluster_return

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
{
"mac_address": "{{ hostvars[item]['mac_address'] }}",
"logical_nic_name": "{{ hostvars[item]['network_interface'] }}"
{% if 'lab_mac' in hostvars[item] %}
},
{
"mac_address": "{{ hostvars[item]['lab_mac'] }}",
"logical_nic_name": "{{ hostvars[item]['lab_interface'] }}"
{% endif %}
}
]
15 changes: 13 additions & 2 deletions ansible/roles/create-inventory/templates/inventory-mno.j2
Original file line number Diff line number Diff line change
Expand Up @@ -86,26 +86,37 @@ network_prefix={{ controlplane_network_prefix }}
{% for hv in ocpinventory_hv_nodes %}
{% set hv_loop = loop %}
{% for vm in range(hw_vm_counts[lab][(hv.pm_addr.split('.')[0]).split('-')[-1]]['default']) %}
{{ hv_vm_prefix }}{{ '%05d' % ctr.vm }} ansible_host={{ hv.pm_addr | replace('mgmt-','') }} hv_ip={{ controlplane_network | ansible.utils.nthhost(hv_loop.index + ocpinventory_worker_nodes|length + mno_worker_node_offset + hv_ip_offset) }} ip={{ controlplane_network | ansible.utils.nthhost(hv_vm_ip_offset + ctr.vm - 1) }} cpus={{ hv_vm_cpu_count }} memory={{ hv_vm_memory_size }} disk_size={{ hv_vm_disk_size }} vnc_port={{ 5900 + loop.index }} mac_address={{ (90520730730496 + ctr.vm) | ansible.utils.hwaddr('linux') }} domain_uuid={{ ctr.vm | to_uuid }} disk_location=/var/lib/libvirt/images bw_avg={{ hv_vm_bandwidth_average }} bw_peak={{ hv_vm_bandwidth_peak }} bw_burst={{ hv_vm_bandwidth_burst }}
{{ hv_vm_prefix }}{{ '%05d' % ctr.vm }} ansible_host={{ hv.pm_addr | replace('mgmt-','') }} hv_ip={{ controlplane_network | ansible.utils.nthhost(hv_loop.index + ocpinventory_worker_nodes|length + mno_worker_node_offset + hv_ip_offset) }} ip={{ controlplane_network | ansible.utils.nthhost(hv_vm_ip_offset + ctr.vm - 1) }} cpus={{ hv_vm_cpu_count }} memory={{ hv_vm_memory_size }} disk_size={{ hv_vm_disk_size }} vnc_port={{ 5900 + loop.index }} mac_address={{ (90520730730496 + ctr.vm) | ansible.utils.hwaddr('linux') }} domain_uuid={{ ctr.vm | to_uuid }} disk_location=/var/lib/libvirt/images bw_avg={{ hv_vm_bandwidth_average }} bw_peak={{ hv_vm_bandwidth_peak }} bw_burst={{ hv_vm_bandwidth_burst }} vendor=Libvirt install_disk=/dev/sda
{% set ctr.vm = ctr.vm + 1 %}
{% endfor %}
{% if hv.disk2_enable %}
{% for vm in range(hw_vm_counts[lab][(hv.pm_addr.split('.')[0]).split('-')[-1]][hv.disk2_device]) %}
{{ hv_vm_prefix }}{{ '%05d' % ctr.vm }} ansible_host={{ hv.pm_addr | replace('mgmt-','') }} hv_ip={{ controlplane_network | ansible.utils.nthhost(hv_loop.index + ocpinventory_worker_nodes|length + mno_worker_node_offset + hv_ip_offset) }} ip={{ controlplane_network | ansible.utils.nthhost(hv_vm_ip_offset + ctr.vm - 1) }} cpus={{ hv_vm_cpu_count }} memory={{ hv_vm_memory_size }} disk_size={{ hv_vm_disk_size }} vnc_port={{ 5900 + loop.index + hw_vm_counts[lab][(hv.pm_addr.split('.')[0]).split('-')[-1]]['default'] }} mac_address={{ (90520730730496 + ctr.vm) | ansible.utils.hwaddr('linux') }} domain_uuid={{ ctr.vm | to_uuid }} disk_location={{ disk2_mount_path }}/libvirt/images bw_avg={{ hv_vm_bandwidth_average }} bw_peak={{ hv_vm_bandwidth_peak }} bw_burst={{ hv_vm_bandwidth_burst }}
{{ hv_vm_prefix }}{{ '%05d' % ctr.vm }} ansible_host={{ hv.pm_addr | replace('mgmt-','') }} hv_ip={{ controlplane_network | ansible.utils.nthhost(hv_loop.index + ocpinventory_worker_nodes|length + mno_worker_node_offset + hv_ip_offset) }} ip={{ controlplane_network | ansible.utils.nthhost(hv_vm_ip_offset + ctr.vm - 1) }} cpus={{ hv_vm_cpu_count }} memory={{ hv_vm_memory_size }} disk_size={{ hv_vm_disk_size }} vnc_port={{ 5900 + loop.index + hw_vm_counts[lab][(hv.pm_addr.split('.')[0]).split('-')[-1]]['default'] }} mac_address={{ (90520730730496 + ctr.vm) | ansible.utils.hwaddr('linux') }} domain_uuid={{ ctr.vm | to_uuid }} disk_location={{ disk2_mount_path }}/libvirt/images bw_avg={{ hv_vm_bandwidth_average }} bw_peak={{ hv_vm_bandwidth_peak }} bw_burst={{ hv_vm_bandwidth_burst }} vendor=Libvirt install_disk=/dev/sda
{% set ctr.vm = ctr.vm + 1 %}
{% endfor %}
{% endif %}

{% endfor %}

[hv_vm:vars]
role=worker
ansible_user=root
ansible_ssh_pass={{ hv_ssh_pass }}
base_domain={{ base_dns_name }}
machine_network={{ controlplane_network }}
network_prefix={{ controlplane_network_prefix }}
gateway={{ controlplane_network_gateway }}
bw_limit={{ hv_vm_bandwidth_limit }}

boot_iso=discovery.iso
lab_interface={{ controlplane_lab_interface }}
network_interface={{ controlplane_network_interface }}
{% if controlplane_bastion_as_dns %}
dns1={{ bastion_controlplane_ip }}
{% else %}
dns1={{ labs[lab]['dns'][0] }}
dns2={{ labs[lab]['dns'][1] | default('') }}
{% endif %}
{% else %}
[hv]
# Set `hv_inventory: true` to populate
Expand Down
2 changes: 1 addition & 1 deletion ansible/roles/mno-post-cluster-install/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@
- name: Label the worker nodes
shell: |
KUBECONFIG={{ bastion_cluster_config_dir }}/kubeconfig oc label no --overwrite {{ item }} localstorage=true prometheus=true
with_items: "{{ groups['worker'] }}"
with_items: "{{ groups['worker'] + groups['hv_vm'][:hybrid_worker_count] }}"

- name: Install local-storage operator
shell:
Expand Down
40 changes: 1 addition & 39 deletions ansible/roles/wait-hosts-discovered/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

- name: MNO - Create list of nodes to be discovered
set_fact:
inventory_nodes: "{{ groups['controlplane'] + groups['worker'] }}"
inventory_nodes: "{{ groups['controlplane'] + groups['worker'] + groups['hv_vm'][:hybrid_worker_count] }}"
when: cluster_type == "mno"

- name: SNO - Create list of nodes to be discovered
Expand Down Expand Up @@ -55,44 +55,6 @@
loop_control:
loop_var: discovered_host

- name: Patch cluster network settings
uri:
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}"
method: PATCH
status_code: [201]
return_content: true
body_format: json
body: {
"cluster_networks": [
{
"cidr": "{{ cluster_network_cidr }}",
"cluster_id": "{{ ai_cluster_id }}",
"host_prefix": "{{ cluster_network_host_prefix }}"
}
],
"service_networks": [
{
"cidr": "{{ service_network_cidr }}",
"cluster_id": "{{ ai_cluster_id }}",
}
]
}

- name: Patch cluster ingress/api vip addresses
uri:
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}"
method: PATCH
status_code: [201]
return_content: true
body_format: json
body: {
"cluster_network_host_prefix": "{{ cluster_network_host_prefix }}",
"vip_dhcp_allocation": "{{ vip_dhcp_allocation }}",
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}],
"api_vips": [{"ip": "{{ controlplane_network_api }}"}],
"network_type": "{{ networktype }}"
}

Comment on lines -58 to -95
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this needs to be reverted as I was unable to make a cluster without including this.

- name: Wait for cluster to be ready
uri:
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}"
Expand Down
2 changes: 2 additions & 0 deletions ansible/vars/all.sample.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ cluster_type:

# Applies to mno clusters
worker_node_count:
# If HV is setup, how many workers are VMs
hybrid_worker_count: 0

# Enter whether the build should use 'dev' (early candidate builds) or 'ga' for Generally Available versions of OpenShift
# Empty value results in playbook failing with error message. Example of dev builds would be 'candidate-4.17', 'candidate-4.16'
Expand Down