Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

feat: run accelerated unattended-upgrade at node creation time #4217

Merged
merged 4 commits into from
Feb 3, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/topics/clusterdefinitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -857,6 +857,7 @@ A cluster can have 0 to 12 agent pool profiles. Agent Pool Profiles are used for
| adminUsername | yes | Describes the username to be used on all linux clusters |
| ssh.publicKeys[].keyData | yes | The public SSH key used for authenticating access to all Linux nodes in the cluster |
| secrets | no | Specifies an array of key vaults to pull secrets from and what secrets to pull from each |
| runUnattendedUpgradesOnBootstrap | no | Invoke an unattended-upgrade when each Linux node VM comes online for the first time. In practice this is accomplished by performing an `apt-get update`, followed by an `apt-get dist-upgrade`, to fetch updated apt configuration, and install all available downstream package updates, respectively. |
| customSearchDomain.name | no | describes the search domain to be used on all linux clusters |
| customSearchDomain.realmUser | no | describes the realm user with permissions to update dns registries on Windows Server DNS |
| customSearchDomain.realmPassword | no | describes the realm user password to update dns registries on Windows Server DNS |
Expand Down
12 changes: 12 additions & 0 deletions parts/k8s/cloud-init/artifacts/cse_helpers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,18 @@ apt_get_dist_upgrade() {
done
echo Executed apt-get dist-upgrade $i times
}
unattended_upgrade() {
retries=10
for i in $(seq 1 $retries); do
wait_for_apt_locks
/usr/bin/unattended-upgrade && break ||
if [ $i -eq $retries ]; then
return 1
else sleep 5
fi
done
echo Executed unattended-upgrade $i times
}
systemctl_restart() {
retries=$1; wait_sleep=$2; timeout=$3 svcname=$4
for i in $(seq 1 $retries); do
Expand Down
4 changes: 4 additions & 0 deletions parts/k8s/cloud-init/artifacts/cse_main.sh
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,10 @@ if [[ $OS == $UBUNTU_OS_NAME ]]; then
fi
{{end}}

{{- if RunUnattendedUpgrades}}
apt_get_update && unattended_upgrade
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests so far prove that the above works to ensure that when there are security updates available, running apt-get update and then running unattended-upgrade successfully, serially, gets them. So we can trust that the "runUnattendedUpgradesOnBootstrap" feature does the right thing and actually applies (i.e., reboots) the OS updates during cluster creation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in the past I saw this not always work but it could have been timing related to when other things are set up with respect to cloudinit. This is likely a better place to do that.

Is there a reason that this would not be the default behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary reason is the judgment that having a node reboot before first coming online offers (1) undesirable delay and (2) demonstrable loss in node bootstrap reliability.

I don't think we can avoid #1, it's definitely going to take longer most of the time for nodes to come online if they come online with a stale OS security package configuration, and if they want to come up-to-date even if it requires a reboot. <-- is always going to drag up the average node bootstrap time

I wonder about #2 though. Can we summarize the additional risk of scooping up untested packages, plus any additional risk that a VM OS won't successfully come back online?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The risk is relatively low but is not zero. We have not had an outage due to the security updates as they are vetted relatively well. The question is how bad is it to run a node without the security updates?

I am not saying someone could not opt out, but it is a question of which way we should be "safe by default" and what "safe" means.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I claim we should start here and make a change to the default after some more testing maybe.

/lgtm

{{- end}}

if [ -f /var/run/reboot-required ]; then
trace_info "RebootRequired" "reboot=true"
/bin/bash -c "shutdown -r 1 &"
Expand Down
1 change: 1 addition & 0 deletions pkg/api/converterfromapi.go
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ func convertLinuxProfileToVLabs(obj *LinuxProfile, vlabsProfile *vlabs.LinuxProf
vlabsProfile.CustomNodesDNS = &vlabs.CustomNodesDNS{}
vlabsProfile.CustomNodesDNS.DNSServer = obj.CustomNodesDNS.DNSServer
}
vlabsProfile.RunUnattendedUpgradesOnBootstrap = obj.RunUnattendedUpgradesOnBootstrap
}

func convertWindowsProfileToVLabs(api *WindowsProfile, vlabsProfile *vlabs.WindowsProfile) {
Expand Down
1 change: 1 addition & 0 deletions pkg/api/convertertoapi.go
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ func convertVLabsLinuxProfile(vlabs *vlabs.LinuxProfile, api *LinuxProfile) {
api.CustomNodesDNS = &CustomNodesDNS{}
api.CustomNodesDNS.DNSServer = vlabs.CustomNodesDNS.DNSServer
}
api.RunUnattendedUpgradesOnBootstrap = vlabs.RunUnattendedUpgradesOnBootstrap
}

func convertVLabsWindowsProfile(vlabs *vlabs.WindowsProfile, api *WindowsProfile) {
Expand Down
13 changes: 7 additions & 6 deletions pkg/api/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -133,12 +133,13 @@ type LinuxProfile struct {
SSH struct {
PublicKeys []PublicKey `json:"publicKeys"`
} `json:"ssh"`
Secrets []KeyVaultSecrets `json:"secrets,omitempty"`
Distro Distro `json:"distro,omitempty"`
ScriptRootURL string `json:"scriptroot,omitempty"`
CustomSearchDomain *CustomSearchDomain `json:"customSearchDomain,omitempty"`
CustomNodesDNS *CustomNodesDNS `json:"CustomNodesDNS,omitempty"`
IsSSHKeyAutoGenerated *bool `json:"isSSHKeyAutoGenerated,omitempty"`
Secrets []KeyVaultSecrets `json:"secrets,omitempty"`
Distro Distro `json:"distro,omitempty"`
ScriptRootURL string `json:"scriptroot,omitempty"`
CustomSearchDomain *CustomSearchDomain `json:"customSearchDomain,omitempty"`
CustomNodesDNS *CustomNodesDNS `json:"CustomNodesDNS,omitempty"`
IsSSHKeyAutoGenerated *bool `json:"isSSHKeyAutoGenerated,omitempty"`
RunUnattendedUpgradesOnBootstrap *bool `json:"runUnattendedUpgradesOnBootstrap,omitempty"`
}

// PublicKey represents an SSH key for LinuxProfile
Expand Down
9 changes: 5 additions & 4 deletions pkg/api/vlabs/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -136,10 +136,11 @@ type LinuxProfile struct {
SSH struct {
PublicKeys []PublicKey `json:"publicKeys" validate:"required,min=1"`
} `json:"ssh" validate:"required"`
Secrets []KeyVaultSecrets `json:"secrets,omitempty"`
ScriptRootURL string `json:"scriptroot,omitempty"`
CustomSearchDomain *CustomSearchDomain `json:"customSearchDomain,omitempty"`
CustomNodesDNS *CustomNodesDNS `json:"customNodesDNS,omitempty"`
Secrets []KeyVaultSecrets `json:"secrets,omitempty"`
ScriptRootURL string `json:"scriptroot,omitempty"`
CustomSearchDomain *CustomSearchDomain `json:"customSearchDomain,omitempty"`
CustomNodesDNS *CustomNodesDNS `json:"customNodesDNS,omitempty"`
RunUnattendedUpgradesOnBootstrap *bool `json:"runUnattendedUpgradesOnBootstrap,omitempty"`
}

// PublicKey represents an SSH key for LinuxProfile
Expand Down
6 changes: 6 additions & 0 deletions pkg/engine/template_generator.go
Original file line number Diff line number Diff line change
Expand Up @@ -775,6 +775,12 @@ func getContainerServiceFuncMap(cs *api.ContainerService) template.FuncMap {
"GetLinuxCSELogPath": func() string {
return linuxCSELogPath
},
"RunUnattendedUpgrades": func() bool {
if cs.Properties.LinuxProfile != nil {
return to.Bool(cs.Properties.LinuxProfile.RunUnattendedUpgradesOnBootstrap)
}
return false
},
"OpenBraces": func() string {
return "{{"
},
Expand Down
16 changes: 16 additions & 0 deletions pkg/engine/templates_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions test/e2e/test_cluster_configs/everything.json
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@
}
],
"linuxProfile": {
"runUnattendedUpgradesOnBootstrap": true,
"adminUsername": "azureuser",
"ssh": {
"publicKeys": [
Expand Down