-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/daemon: allow reboots from external componenents #576
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: runcom The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -319,9 +319,15 @@ func (dn *Daemon) bootstrapNode() error { | |||
return err | |||
} | |||
dn.node = node | |||
if _, err := os.Stat("/etc/defaults/rhcos/reboot-needed"); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, /etc/defaults
is a legacy of SySV init; no reason to put anything new there.
The right place for this is more likely /run/machine-config-operator/reboot-needed
or so. (We also have /etc/machine-config-daemon
used for /etc/machine-config-daemon/state.json
but I think /run
makes more sense since it goes away automatically if the machine is rebooted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ashcrow
@@ -400,6 +406,15 @@ func (dn *Daemon) syncNode(key string) error { | |||
return err | |||
} | |||
} | |||
// FIXME: this is a temporary workaround, remove once an MCD signaling system is in place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -400,6 +406,15 @@ func (dn *Daemon) syncNode(key string) error { | |||
return err | |||
} | |||
} | |||
// FIXME: this is a temporary workaround, remove once an MCD signaling system is in place | |||
if _, err := os.Stat("/etc/defaults/rhcos/reboot-needed"); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
What other components are going to be asking for a reboot? |
This is a temporary stopgap to allow other componenents ask for a reboot since the MCD is the one in charge of draining a node and rebooting it. The logic is simple with this: - if an update is already coming, we just go through the update and if successful, the machine is gonna reboot just fine - if we're not updating through MachineConfigs, then we check for a file and drain+reboot - the "reboot file" is the removed on MCD boot Signed-off-by: Antonio Murdaca <[email protected]>
hopefully just the one @ashcrow needs for 4.1 |
Yeah, it's just RHCOS and Installer team for 4.1 (same use case and underlying tool) Context: https://url.corp.redhat.com/bc7c76c |
I don't understand why that work isn't driven by the MCD. We already have other needs to adjust kernel arguments (per #388 ) - and further it's going to be problematic to have some other service running As is today AFAICS it will race; there's no ordering relationship between that service and Further it overlaps heavily with ostreedev/ostree#479 which Robert is working on to aid in dropping Anaconda and removing the dependency on grub2 with ostree in general which will solve various other issues. |
Short term hack, we can make this new service |
@cgwalters I'll update the host level service with
|
Remember that the "early pivot" happens before the node has joined the cluster. So the MCD isn't involved here - no need to drain anything. Can we de-scope this from "arbitrary kernel arguments are supported changing at any time" to "If Today, we can assume that Maybe the simplest is actually to add this to (Though that leaves the "BYO host" scenario, but will they really be expecting us to inject kernel args and reboot traditional hosts? This seems like more in the territory of "you brought it, you manage it"?) |
To elaborate on this a bit...it looks like the current prototype service runs every boot. That greatly increases the scope; if we only have to care about it during early pivot, thereafter the |
I'm OK with the above proposal for the near term. I'm going to bounce it off other folks who were part of the design before we commit to change. The POC today runs whenever the |
/cc @abhinavdahiya |
I'm fine if we support |
I spoke with others and they are cool with this for 4.1. How about
If this sounds good I'll shelve my system service API for now and throw something together starting in the morning. |
isn't this allowing "arbitrary kernel arguments"? for 4.1, we just want For the future, then yeah, I agree with what you wrote, that can be a path towards allowing general and arbitrary kargs. |
It can be expanded to be arbitrary. For now it will only be the subset for tuning.
Almost. It needs to be in early pivot or toggled later. |
@runcom thank you for putting this PR together, but I think it can be closed. I'll hopefully have something up today or tomorrow in it's place. |
Thanks!!! |
If we really need to support toggling it later I don't see how we avoid implementing #388 right? |
For a later version, yes. But if we scope this to the few tuning parameters that are needed we the current MCO/MC and update to pivot should work. The installer can lay down a file at install time as needed and, if the cluster admin wants to modify the supported tuning options, they can lay down an updated |
|
Checks /etc/pivot/kernel-args for argument changes. This early version only supports bare arguments that are in the whitelist. See conversation at openshift/machine-config-operator#576 Signed-off-by: Steve Milner <[email protected]>
Checks /etc/pivot/kernel-args for argument changes. This early version only supports bare arguments that are in the whitelist. See conversation at openshift/machine-config-operator#576 Signed-off-by: Steve Milner <[email protected]>
Checks /etc/pivot/kernel-args for argument changes. This early version only supports bare arguments that are in the whitelist. See conversation at openshift/machine-config-operator#576 Signed-off-by: Steve Milner <[email protected]>
Checks /etc/pivot/kernel-args for argument changes. This early version only supports bare arguments that are in the whitelist. See conversation at openshift/machine-config-operator#576 Signed-off-by: Steve Milner <[email protected]>
Checks /etc/pivot/kernel-args for argument changes. This early version only supports bare arguments that are in the whitelist. See conversation at openshift/machine-config-operator#576 Signed-off-by: Steve Milner <[email protected]>
This is a temporary stopgap to allow other componenents ask for a reboot
since the MCD is the one in charge of draining a node and rebooting it.
The logic is simple with this:
successful, the machine is gonna reboot just fine
and drain+reboot
Signed-off-by: Antonio Murdaca [email protected]