-
Notifications
You must be signed in to change notification settings - Fork 521
feat: run accelerated unattended-upgrade at node creation time #4217
Conversation
@@ -276,6 +276,10 @@ if [[ $OS == $UBUNTU_OS_NAME ]]; then | |||
fi | |||
{{end}} | |||
|
|||
{{- if RunUnattendedUpgrades}} | |||
apt_get_update && apt_get_dist_upgrade && unattended_upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice (I think) the unattended_upgrade
invocation here is superfluous (update and dist-upgrade will effectively do the deed; including it here to be extra explicit.
perhaps @Michael-Sinz can confirm if this is sane
Mainly I trust our apt_get_update
and apt_get_dist_upgrade
functions to definitively accomplish those tasks over silently calling /usr/bin/unattended-upgrade
. The latter (by design) silently fails single invocations (because it knows it'll be invoked again — it's not in a rush) if, for example, various apt locks are held (there are probably other reasons).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big difference between unattended-upgrades and apt-get dist-upgrade is the list of things it will install.
Unattended upgrades is constrained to the list of updates that are deemed safe and vital for security/reliability. They are not minor feature updates unless that was required for security. (This is the default and recommended configuration for unattended-upgrade)
For example, on a test VM, I just logged in and noticed this right now:
58 packages can be updated.
4 updates are security updates.
After running unattended-upgrades on that machine (which normally cron does for me on regular basis), the login looks like this:
54 packages can be updated.
0 updates are security updates.
This is very different from a full apt-get update/apt-get upgrade (which itself is less than apt-get dist-upgrade)
The actual ubuntu unattended-upgrade command will return an error if it fails to complete an update. But it is constrained to the security updates.
Another good thing about unattended-upgrades is that it does set the unattended settings for apt/apt-get/dpkg such that it should not hang (albeit, packages can still cause this problems but that is rare in the security patches).
Which to use is really a question of risks. Balancing all of them.
We run unattended-upgrade on a regular basis because we can trust it at scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS - It is redundant to run unattended-upgrade after having done the full upgrade or dist-upgrade.
It may be useful to do unattended-upgrade first just to be sure they complete before getting into the larger set (both from a security standpoint and an ability to complete them)
So I would not run unattended afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all makes sense. What's perplexing is that, in practice, simply adding a "wait for apt locks and then run unattended-upgrade" during CSE does not in my tests produce the expected /var/run/reboot-required (a symptom of critical security updates arriving) outcome.
I'm going to try apt-get update
&& unattended-upgrade
next.
Codecov Report
@@ Coverage Diff @@
## master #4217 +/- ##
=======================================
Coverage 73.36% 73.36%
=======================================
Files 135 135
Lines 20849 20855 +6
=======================================
+ Hits 15296 15301 +5
- Misses 4576 4577 +1
Partials 977 977
Continue to review full report at Codecov.
|
@@ -276,6 +276,10 @@ if [[ $OS == $UBUNTU_OS_NAME ]]; then | |||
fi | |||
{{end}} | |||
|
|||
{{- if RunUnattendedUpgrades}} | |||
apt_get_update && unattended_upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My tests so far prove that the above works to ensure that when there are security updates available, running apt-get update and then running unattended-upgrade successfully, serially, gets them. So we can trust that the "runUnattendedUpgradesOnBootstrap" feature does the right thing and actually applies (i.e., reboots) the OS updates during cluster creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in the past I saw this not always work but it could have been timing related to when other things are set up with respect to cloudinit. This is likely a better place to do that.
Is there a reason that this would not be the default behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary reason is the judgment that having a node reboot before first coming online offers (1) undesirable delay and (2) demonstrable loss in node bootstrap reliability.
I don't think we can avoid #1, it's definitely going to take longer most of the time for nodes to come online if they come online with a stale OS security package configuration, and if they want to come up-to-date even if it requires a reboot. <-- is always going to drag up the average node bootstrap time
I wonder about #2 though. Can we summarize the additional risk of scooping up untested packages, plus any additional risk that a VM OS won't successfully come back online?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The risk is relatively low but is not zero. We have not had an outage due to the security updates as they are vetted relatively well. The question is how bad is it to run a node without the security updates?
I am not saying someone could not opt out, but it is a question of which way we should be "safe by default" and what "safe" means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I claim we should start here and make a change to the default after some more testing maybe.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, Michael-Sinz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Reason for Change:
This PR adds a
runUnattendedUpgradesOnBootstrap
option to thelinuxProfile
api model configuration, to allow folks to explicitly accelerate the acceptance of new downstream packages on node VMs when bringing them online.In practice this will slow down node creation time, and will require extra post-installation validation as any installed packages that were not already present on the AKS Engine-curated VHD will not have been tested (this assumes you're using one of those VHDs).
Fixes #4156
Issue Fixed:
Credit Where Due:
Does this change contain code from or inspired by another project?
If "Yes," did you notify that project's maintainers and provide attribution?
Requirements:
Notes: