Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-15087: templates: Introduce kubelet-dependencies.target #3865

Merged

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Aug 15, 2023

The primary motivation here is to eventually stop pulling container images Before=network-online.target because it creates complicated dependency loops.

We introduce a new kubelet-dependencies.target - both crio.service and kubelet.service are After+Requires=kubelet-dependencies.target.

This also makes it cleaner for pre-kubelet services, which can just order themselves Before=kubelet-dependencies.target.

This is just a small preparatory PR to introduce the target unit. The real bigger change will come when we move services like ovs-configuration.service and node-valid-hostname.service to actually be After=network-online.target + Before=kubelet-dependendies.target.

Crucially, this will unblock services like machine-config-daemon-pull.service that want to fetch containers before kubelet.service but want to be After=network-online.target.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 15, 2023
@cgwalters cgwalters force-pushed the introduce-kubelet-dependencies branch from e7fb1cf to e7ad4e3 Compare August 15, 2023 21:19
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 15, 2023
@cgwalters
Copy link
Member Author

xref #3858 and #3788 which want this.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 15, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@cgwalters
Copy link
Member Author

cgwalters commented Aug 15, 2023

Totally untested pre-PR, so

/test e2e-aws-ovn
/test e2e-hypershift

@cgwalters
Copy link
Member Author

/test all

@cgwalters
Copy link
Member Author

Need to verify this works with upgrades too, this will be the first time we've added a .target unit into Ignition configs etc.

@cgwalters
Copy link
Member Author

The primary motivation here is to eventually stop pulling container images Before=network-online.target because it creates complicated dependency loops.

Another way to say this: I don't think e.g. ovs-configuration.service needs to be Before=network-online.target. It only needs to be Before=kubelet.service crio.service. Injecting all this stuff into network-online.target is just polluting the meaning of the target, particularly because we actually do need to fetch container images before reaching that target!

@cgwalters
Copy link
Member Author

cc @cybertron @trozet

@cgwalters
Copy link
Member Author

Hmm, I don't understand the build controller unit failures?

     build_controller_test.go:457: 
        	Error Trace:	/go/src/github.com/openshift/machine-config-operator/pkg/controller/build/fixtures_test.go:181
        	            				/go/src/github.com/openshift/machine-config-operator/pkg/controller/build/build_controller_test.go:457
        	Error:      	Received unexpected error:
        	            	timed out waiting for the condition
        	Test:       	TestBuildControllerMultipleOptedInPools/Happy_Path_Multiple_Configs/worker/CustomBuildPod/rendered-worker-9
        	Messages:   	MachineConfigPool worker never reached desired state
    build_controller_test.go:463: Config name, actual: rendered-worker-9, expected: rendered-worker-9
    build_controller_test.go:464: Is build success? false
    build_controller_test.go:465: Has all MachineConfig refs? true 

Not immediately reproducing this locally...

@cybertron
Copy link
Member

Some thoughts on this, but the TLDR is that I think we should try it.

I have concerns with reaching network-online before ovs-configuration has run since that will cause a network blip when it runs and reconfigures things. If something does rely on network-online to know when networking is ready, it may start before ovs-configuration runs and then be broken when the bridge changes are made.

However, I think this is mitigated by a couple of things:

  1. No user workloads will be running until Kubelet starts, so we should have the ability to massage dependencies in any services deployed prior to that. I guess the exception to this would be if a user deployed a service directly to the system via machine-config. If such a service got broken by this change, at least it's simple to fix (change target from network-online to kubelet-dependencies, which should be largely equivalent).
  2. The bugs being caused by this[0] are severe enough that we may need to accept some risk of breaking edge cases to fix the more common case. Being unable to login to a system to debug a problem during early boot makes troubleshooting very difficult.

0: https://issues.redhat.com/browse/OCPBUGS-18541

@cybertron
Copy link
Member

I did some testing with this locally and it seems to do what we want. I updated the ovs-configuration and node-valid-hostname dependencies to use this target and then injected an artificial delay into nodeip-configuration. It no longer blocked network-online:

[root@master-0 core]# systemctl status nodeip-configuration | grep Active
     Active: inactive (dead) since Fri 2023-09-22 22:41:45 UTC; 29min ago
[root@master-0 core]# systemctl status network-online.target | grep Active
     Active: active since Fri 2023-09-22 22:40:44 UTC; 31min ago # <-- Earlier than nodeip-configuration finished
[root@master-0 core]# systemctl status kubelet | grep Active
     Active: active (running) since Fri 2023-09-22 22:41:56 UTC; 30min ago # <-- Still after nodeip-configuration finished

@cgwalters
Copy link
Member Author

I have concerns with reaching network-online before ovs-configuration has run since that will cause a network blip when it runs and reconfigures things. If something does rely on network-online to know when networking is ready, it may start before ovs-configuration runs and then be broken when the bridge changes are made.

The other factor is that network-online.target is really just a hint. Anything doing networking must be resilient to blips and flakes. Having things poll to eventual consistentency is the whole model we're aiming for in Kubernetes.

(It's really just this "pre-kubelet" stuff where we haven't really ingrained that mindset!)

@cgwalters cgwalters marked this pull request as ready for review September 26, 2023 13:48
@cgwalters
Copy link
Member Author

OK rebased 🏄 on master, and I added another commit to actually do the binding of ovs-configuration and node-valid-hostname.

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2023
@cgwalters cgwalters changed the title templates: Introduce kubelet-dependencies.target OCPBUGS-15087: templates: Introduce kubelet-dependencies.target Sep 26, 2023
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 26, 2023
@openshift-ci-robot
Copy link
Contributor

@cgwalters: This pull request references Jira Issue OCPBUGS-15087, which is invalid:

  • expected the bug to target the "4.15.0" version, but it targets "4.14.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The primary motivation here is to eventually stop pulling container images Before=network-online.target because it creates complicated dependency loops.

We introduce a new kubelet-dependencies.target - both crio.service and kubelet.service are After+Requires=kubelet-dependencies.target.

This also makes it cleaner for pre-kubelet services, which can just order themselves Before=kubelet-dependencies.target.

This is just a small preparatory PR to introduce the target unit. The real bigger change will come when we move services like ovs-configuration.service and node-valid-hostname.service to actually be After=network-online.target + Before=kubelet-dependendies.target.

Crucially, this will unblock services like machine-config-daemon-pull.service that want to fetch containers before kubelet.service but want to be After=network-online.target.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 26, 2023
@cgwalters
Copy link
Member Author

I've mirrored to quay.io/cgwalters/release:4.14.0-0.ci.test-2023-10-04-145907-ci-ln-4illsfk-latest-x86_64 and kicked off a cluster upgrade

@cgwalters
Copy link
Member Author

Ohhhh right; actually oc adm release mirror doesn't rewrite the pull specs in the image, so that didn't help. We'll have to manually craft a release image with just this updated MCO code instead...looking

@sergiordlr
Copy link

sergiordlr commented Oct 5, 2023

We have installed the fix in azure and it was installed OK. (We saw an azure installation failing because of kargs problems in a master node, this problem does not seem to be related to our fix and in a retry the installation succeeded)

We have installed the fix using in IPI on OSP and we have been able to create 10 new workers using a 4.6 cloud image.

We have installed the fix using UPI on Vsphere and we have been able to create 10 new workers using a 4.6 cloud image.

No ordering cycle problems found.

The only problem that we found happens when using 4.1 cloud images. The problem happens because the podman version is too old and the pre-merge images cannot be pulled correctly using this version. This problem should not happen with the nightly/official builds. Since we cannot test it pre-merge, what we will do is to test it post-merge using the nightly builds.

~~We can add the qe-approved label.

/label qe-approved~~

PS: Finally the azure problem in the installation seems related to our fix.

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Oct 5, 2023
@sergiordlr
Copy link

/remove-label qe-approved

While investigating the azure error I found that maybe it is related to our fix. Please, could you have a look at this journal log?
journal-azure-master.log

@openshift-ci openshift-ci bot removed the qe-approved Signifies that QE has signed off on this PR label Oct 5, 2023
The primary motivation here is to stop pulling
container images `Before=network-online.target` because it
creates complicated dependency loops.

This is aiming to fix
https://issues.redhat.com/browse/OCPBUGS-15087

A lot of our services are "explicitly coupled" with ordering
relationships; e.g. some had `Before=kubelet.service` but not
`Before=crio.service`.

systemd .target units are explicitly designed for this situation.

We introduce a new `kubelet-dependencies.target` - both `crio.service`
and `kubelet.service` are `After+Requires=kubelet-dependencies.target`.
And units which are needed for kubelet should now be both
`Before + RequiredBy=kubelet-dependencies.target`.

Similarly, we had a lot of entangling of the "node services"
and the firstboot OS updates, with things explicitly ordering
against `machine-config-daemon-pull.service` or poking into
the implementation details of the firstboot process with
`ConditionPathExists=!/etc/ignition-machine-config-encapsulated.json`.

Create a new `firstboot-osupdate.target` that succeds after the
`machine-config-daemon-firstboot.service` today.  Then most of the
"infrastructure workload" that must run only on the second boot
(such as `gcp-hostname.service`, `openshift-azure-routes.path` etc)
can cleanly order after that.

This also aids with the coming work for bare metal installs to do
OS udpates at install time, because then we will "finalize" the OS
update and continue booting.
@cgwalters
Copy link
Member Author

Good catch! Latest commit should fix it.

@cgwalters
Copy link
Member Author

(hmm we should add a test case to the e2e that verifies there are no ordering cycles across the board)

@cgwalters cgwalters force-pushed the introduce-kubelet-dependencies branch from 7c35b05 to 2141f4b Compare October 5, 2023 14:49
@cgwalters
Copy link
Member Author

/test e2e-azure

@cgwalters
Copy link
Member Author

The e2e-azure run has no ordering cycles in the journal.

@sergiordlr
Copy link

sergiordlr commented Oct 5, 2023

We installed it in UPI on GCP, IPI on GCP, UPI on Azure, IPI on Azure

In IPI on OSP

  1. Install
  2. Create 10 new workers using the 4.6 cloud image

In UPI on Vsphrere

  1. Install
  2. Create 20 new workers using the 4.6 cloud image.

No problems were found.

The only problem was the 4.1 image in test case "63894-Scaleup using 4.1 cloud image". It will have to be tested post-merge.

We can add the qe-approeved label

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Oct 5, 2023
@cgwalters
Copy link
Member Author

I think the current presubmit failures were transient flake
/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 6, 2023

@cgwalters: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 2141f4b link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cgwalters
Copy link
Member Author

I think we're good for someone to lgtm this?

@sinnykumari
Copy link
Contributor

Let's get this in. We have good signal from payload testing and our QE testing.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 9, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 9, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, sinnykumari, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [sinnykumari,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sinnykumari
Copy link
Contributor

@rioliu-rh @sergiordlr Is there still anything we need before remove hold on the PR?

@sergiordlr
Copy link

@sinnykumari we can safely remove the /hold, no problem.

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 10, 2023
@openshift-ci openshift-ci bot merged commit 0f5a113 into openshift:master Oct 10, 2023
15 of 16 checks passed
@openshift-ci-robot
Copy link
Contributor

@cgwalters: Jira Issue OCPBUGS-15087: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-15087 has been moved to the MODIFIED state.

In response to this:

The primary motivation here is to eventually stop pulling container images Before=network-online.target because it creates complicated dependency loops.

We introduce a new kubelet-dependencies.target - both crio.service and kubelet.service are After+Requires=kubelet-dependencies.target.

This also makes it cleaner for pre-kubelet services, which can just order themselves Before=kubelet-dependencies.target.

This is just a small preparatory PR to introduce the target unit. The real bigger change will come when we move services like ovs-configuration.service and node-valid-hostname.service to actually be After=network-online.target + Before=kubelet-dependendies.target.

Crucially, this will unblock services like machine-config-daemon-pull.service that want to fetch containers before kubelet.service but want to be After=network-online.target.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@cgwalters: #3865 failed to apply on top of branch "release-4.14":

Applying: Introduce kubelet-dependencies.target and firstboot-osupdate.target
.git/rebase-apply/patch:77: trailing whitespace.
  
.git/rebase-apply/patch:413: trailing whitespace.
  
.git/rebase-apply/patch:64: new blank line at EOF.
+
warning: 3 lines add whitespace errors.
Using index info to reconstruct a base tree...
M	templates/common/_base/units/machine-config-daemon-firstboot.service.yaml
M	templates/common/_base/units/machine-config-daemon-pull.service.yaml
Falling back to patching base and 3-way merge...
Auto-merging templates/common/_base/units/machine-config-daemon-pull.service.yaml
CONFLICT (content): Merge conflict in templates/common/_base/units/machine-config-daemon-pull.service.yaml
Auto-merging templates/common/_base/units/machine-config-daemon-firstboot.service.yaml
CONFLICT (content): Merge conflict in templates/common/_base/units/machine-config-daemon-firstboot.service.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Introduce kubelet-dependencies.target and firstboot-osupdate.target
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member Author

Cherry pick is in #3967 - just trivial conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants