Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract KSM management task #160

Merged

Conversation

bogdando
Copy link
Contributor

@bogdando bogdando commented May 19, 2023

KSM is not Nova compute specific, manage it in the
generic EDPM kernel role's deidcated task instead.

Also drop the RHEL8 family specific code path as
there will only be RHEL>=9 in NextGen EDPM, and
rework the tripleo-specific approach for using handlers
to remove the KSM package after disabling the services.

Closes: OSPRH-137

@openshift-ci openshift-ci bot requested review from fultonj and rabi May 19, 2023 13:32
@bogdando bogdando requested review from rebtoor and SeanMooney and removed request for rabi and fultonj May 19, 2023 13:32
Copy link
Contributor

@SeanMooney SeanMooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall i think this is ok but I'm not sure how large the scope of Bootstrap is intended to be. currently its pretty small so its probably ok.

i also agree with keeping ksm off by default.
it can be useful in some cases but for most of our customer i think off is the right default as they wont want the CPU overhead and non-determinisum cause by ksm.

@bogdando
Copy link
Contributor Author

bogdando commented May 19, 2023

overall i think this is ok but I'm not sure how large the scope of Bootstrap is intended to be. currently its pretty small so its probably ok.

i also agree with keeping ksm off by default. it can be useful in some cases but for most of our customer i think off is the right default as they wont want the CPU overhead and non-determinisum cause by ksm.

yes, I believe such tiny tasks do not need dedicated roles, and fits the (host) bootstrap task ideally. This also aligns with the "Tripleo future" spec, where we agreed to cover things like that by the bootstrap task

@bogdando bogdando requested review from slagle and kajinamit May 19, 2023 13:44
@kajinamit
Copy link
Contributor

I should have checked the comments here before I posted the review.

yes, I believe such tiny tasks do not need dedicated roles, and fits the (host) bootstrap task ideally. This also aligns with the "Tripleo future" spec, where we agreed to cover things like that by the bootstrap task

I'm not too sure this is really true. For example we have a separate role for kernel parameters. What you are saying would mean we agreed to merge even edpm_kernel role to bootstrap role but I don't think implementing more logics directly in boostrap is better for readability and maintainability.

roles/edpm_bootstrap/defaults/main.yml Outdated Show resolved Hide resolved
@bogdando bogdando changed the title Extract KSM management task to bootstrap [WIP] Extract KSM management task to bootstrap May 23, 2023
@bogdando
Copy link
Contributor Author

bogdando commented Sep 4, 2023

this seems need a full rewrite after the change in edpm roles structure (no roles, but playbooks?)

@SeanMooney
Copy link
Contributor

this seems need a full rewrite after the change in edpm roles structure (no roles, but playbooks?)

this has not changed we still have roles we just also have playbooks for the openstackdataplane services

so this does not need a full rewrite i just need to be extracted either into its own role or into one of the existing roles.

if you extract it into its own role then that role will need to be called form the bootstrap playbook.

@bogdando bogdando changed the title [WIP] Extract KSM management task to bootstrap Extract KSM management task to bootstrap Sep 13, 2023
@bogdando
Copy link
Contributor Author

bogdando commented Sep 13, 2023

Why not the edpm_kernel role in a ksm.yml tasks file?

@slagle is it OK to use it for bootstrap role? Do you insist to move this logic into edpm_kernel?
I'm not sure, the latter looks very much specific to managing kernel modules and sysctl vars, not some arbitrary actions

@bogdando bogdando requested a review from slagle September 13, 2023 11:25
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/41ce3aceb1db44e3830d4fc4e79e8998

✔️ edpm-ansible-molecule-edpm_podman SUCCESS in 6m 47s
✔️ edpm-ansible-molecule-edpm_module_load SUCCESS in 5m 40s
✔️ edpm-ansible-molecule-edpm_kernel SUCCESS in 12m 38s
✔️ edpm-ansible-molecule-edpm_libvirt SUCCESS in 6m 46s
✔️ edpm-ansible-molecule-edpm_nova SUCCESS in 6m 52s
✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 41m 04s
podified-multinode-edpm-deployment-crc FAILURE in 1h 21m 03s
edpm-ansible-crc-podified-edpm-baremetal FAILURE in 1h 25m 27s

@slagle
Copy link
Contributor

slagle commented Sep 15, 2023

Why not the edpm_kernel role in a ksm.yml tasks file?

@slagle is it OK to use it for bootstrap role? Do you insist to move this logic into edpm_kernel? I'm not sure, the latter looks very much specific to managing kernel modules and sysctl vars, not some arbitrary actions

I do think it should be in a separate role from bootstrap. Otherwise bootstrap is just a catch all for a bunch of unrelated tasks.

@bogdando
Copy link
Contributor Author

Why not the edpm_kernel role in a ksm.yml tasks file?

@slagle is it OK to use it for bootstrap role? Do you insist to move this logic into edpm_kernel? I'm not sure, the latter looks very much specific to managing kernel modules and sysctl vars, not some arbitrary actions

I do think it should be in a separate role from bootstrap. Otherwise bootstrap is just a catch all for a bunch of unrelated tasks.

ok, I'll update this then, thanks!

@bogdando bogdando force-pushed the ksm_extract branch 2 times, most recently from 32f7f72 to 1de8339 Compare September 20, 2023 12:32
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/19827e5e16604e389bd9729b7be39d6d

openstack-k8s-operators-content-provider RETRY_LIMIT in 9s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ edpm-ansible-molecule-edpm_podman SUCCESS in 6m 12s
edpm-ansible-molecule-edpm_module_load RETRY_LIMIT in 3s
✔️ edpm-ansible-molecule-edpm_kernel SUCCESS in 11m 32s
✔️ edpm-ansible-molecule-edpm_libvirt SUCCESS in 6m 16s
✔️ edpm-ansible-molecule-edpm_nova SUCCESS in 6m 28s

@rebtoor
Copy link
Contributor

rebtoor commented Sep 20, 2023

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/36cedca9c2434fef92ffc86b97c23c1a

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 46m 32s
podified-multinode-edpm-deployment-crc FAILURE in 1h 22m 58s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 30m 46s
✔️ edpm-ansible-molecule-edpm_podman SUCCESS in 6m 13s
✔️ edpm-ansible-molecule-edpm_module_load SUCCESS in 4m 58s
✔️ edpm-ansible-molecule-edpm_kernel SUCCESS in 11m 04s
✔️ edpm-ansible-molecule-edpm_libvirt SUCCESS in 6m 15s
✔️ edpm-ansible-molecule-edpm_nova SUCCESS in 6m 06s

@rebtoor
Copy link
Contributor

rebtoor commented Sep 20, 2023

@bogdando jobs is failing with this error:

2023-09-20T14:13:13.481258510+00:00 stdout F TASK [osp.edpm.edpm_kernel : Disable KSM services] *****************************
2023-09-20T14:13:13.482572161+00:00 stdout F Wednesday 20 September 2023  14:13:13 +0000 (0:00:00.054)       0:00:32.375 *** 
2023-09-20T14:13:14.231864503+00:00 stdout F �[0;31mfailed: [edpm-compute-0] (item=ksm.service) => {"ansible_loop_var": "item", "changed": false, "item": "ksm.service", "msg": "Could not find the requested service ksm.service: host"}�[0m
2023-09-20T14:13:14.876269491+00:00 stdout F �[0;31mfailed: [edpm-compute-0] (item=ksmtuned.service) => {"ansible_loop_var": "item", "changed": false, "item": "ksmtuned.service", "msg": "Could not find the requested service ksmtuned.service: host"}�[0m

Is the package who owns those files missing?

@bogdando
Copy link
Contributor Author

@bogdando jobs is failing with this error:

2023-09-20T14:13:13.481258510+00:00 stdout F TASK [osp.edpm.edpm_kernel : Disable KSM services] *****************************
2023-09-20T14:13:13.482572161+00:00 stdout F Wednesday 20 September 2023  14:13:13 +0000 (0:00:00.054)       0:00:32.375 *** 
2023-09-20T14:13:14.231864503+00:00 stdout F �[0;31mfailed: [edpm-compute-0] (item=ksm.service) => {"ansible_loop_var": "item", "changed": false, "item": "ksm.service", "msg": "Could not find the requested service ksm.service: host"}�[0m
2023-09-20T14:13:14.876269491+00:00 stdout F �[0;31mfailed: [edpm-compute-0] (item=ksmtuned.service) => {"ansible_loop_var": "item", "changed": false, "item": "ksmtuned.service", "msg": "Could not find the requested service ksmtuned.service: host"}�[0m

Is the package who owns those files missing?

it looks like ksmtuned is a leftover from rhel8 support https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/nova/nova-compute-container-puppet.yaml#L1678
I will probably remove it, if my assumption for no longer maintaining rhel8 for NG EDPM was correct. But it may be not, as we still have mixed-rhel adoption story. Putting this on hold

/hold

@slagle
Copy link
Contributor

slagle commented Sep 21, 2023

@bogdando jobs is failing with this error:

2023-09-20T14:13:13.481258510+00:00 stdout F TASK [osp.edpm.edpm_kernel : Disable KSM services] *****************************
2023-09-20T14:13:13.482572161+00:00 stdout F Wednesday 20 September 2023  14:13:13 +0000 (0:00:00.054)       0:00:32.375 *** 
2023-09-20T14:13:14.231864503+00:00 stdout F �[0;31mfailed: [edpm-compute-0] (item=ksm.service) => {"ansible_loop_var": "item", "changed": false, "item": "ksm.service", "msg": "Could not find the requested service ksm.service: host"}�[0m
2023-09-20T14:13:14.876269491+00:00 stdout F �[0;31mfailed: [edpm-compute-0] (item=ksmtuned.service) => {"ansible_loop_var": "item", "changed": false, "item": "ksmtuned.service", "msg": "Could not find the requested service ksmtuned.service: host"}�[0m

Is the package who owns those files missing?

it looks like ksmtuned is a leftover from rhel8 support https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/nova/nova-compute-container-puppet.yaml#L1678 I will probably remove it, if my assumption for no longer maintaining rhel8 for NG EDPM was correct. But it may be not, as we still have mixed-rhel adoption story. Putting this on hold

/hold

There is no mixed-rhel support that includes rhel 8 for NG. You can drop any rhel8 specific tasks.

@bogdando
Copy link
Contributor Author

/unhold

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/8fe899d555874bf6b5bce7b3428e4818

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 37m 29s
podified-multinode-edpm-deployment-crc FAILURE in 1h 21m 09s
✔️ edpm-ansible-molecule-edpm_podman SUCCESS in 5m 45s
✔️ edpm-ansible-molecule-edpm_module_load SUCCESS in 5m 02s
✔️ edpm-ansible-molecule-edpm_kernel SUCCESS in 12m 55s
✔️ edpm-ansible-molecule-edpm_libvirt SUCCESS in 8m 21s
✔️ edpm-ansible-molecule-edpm_nova SUCCESS in 8m 28s

@bogdando
Copy link
Contributor Author

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f2f21ee9ee6a4f1ca189d3ca42d5cf2f

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 33m 46s
podified-multinode-edpm-deployment-crc FAILURE in 1h 20m 06s
✔️ edpm-ansible-molecule-edpm_podman SUCCESS in 5m 56s
✔️ edpm-ansible-molecule-edpm_module_load SUCCESS in 5m 17s
✔️ edpm-ansible-molecule-edpm_kernel SUCCESS in 11m 13s
✔️ edpm-ansible-molecule-edpm_libvirt SUCCESS in 6m 03s
✔️ edpm-ansible-molecule-edpm_nova SUCCESS in 6m 31s

@rebtoor
Copy link
Contributor

rebtoor commented Sep 26, 2023

@bogdando Still failing:

2023-09-25T15:19:09.692801347+00:00 stdout F TASK [osp.edpm.edpm_kernel : Disable KSM services] *****************************
2023-09-25T15:19:09.693937138+00:00 stdout F Monday 25 September 2023  15:19:09 +0000 (0:00:00.067)       0:00:33.795 ****** 
2023-09-25T15:19:10.412622013+00:00 stdout F �[0;31mfatal: [edpm-compute-0]: FAILED! => {"changed": false, "msg": "Could not find the requested service ksm.service: host"}�[0m

KSM is not Nova compute specific, manage it in the
generic EDPM kernel role's deidcated task instead.

Additionally, ensure the bootstrap role always installs
the dependency package ksmtuned.

Signed-off-by: Bohdan Dobrelia <[email protected]>
@openshift-ci openshift-ci bot added the lgtm label Sep 26, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bogdando, rebtoor, slagle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit fc0cc7d into openstack-k8s-operators:main Sep 26, 2023
@bogdando bogdando deleted the ksm_extract branch September 27, 2023 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants