Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Enable VM boot diagnostics #901

Merged

Conversation

CecileRobertMichon
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon commented Aug 27, 2020

What this PR does / why we need it: Azure VM and VMSS support a boot diagnostics feature which streams cloud init logs and boot time output. This is incredibly useful for debugging bootstrapping failures when using cloud init, as all the logs get exfiltrated from the machine.

This PR enables that option on all VM/VMSS deployments using the managed storage account option (recommended) which means there is no need to create a storage account. In the future, we can consider enabling a custom storage account for users to provide their own storage if there is demand for that flexibility.

Note that boot diagnostics also work on Windows VMs https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/boot-diagnostics.

Screen Shot 2020-08-26 at 5 14 10 PM

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #606

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Enabled VM boot diagnostics

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 27, 2020
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Aug 27, 2020
Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, managed storage... I see. I was confused going through the code b/c I was looking for storage references to be setup for using boot diagnostics. Using managed storage is a much better option!

Without the need to provision the storage associated with the boot diagnostics, this is a relatively small change set.

LGTM

@mboersma
Copy link
Contributor

using the managed storage account option (recommended) which means there is no need to create a storage account

So judging from the content of this PR, managed storage is the silent default? It sounds like good magic but I confess I'm clueless how it works and couldn't find specific info in Azure docs either.

@CecileRobertMichon
Copy link
Contributor Author

I think it's new

Screen Shot 2020-08-27 at 10 08 02 AM

@CecileRobertMichon
Copy link
Contributor Author

/assign @mboersma @alexeldeib

@nader-ziada
Copy link
Contributor

tried creating a cluster with this branch and got a very useful log about the start of the machine in the Boot diagnostics sub menu on the Azure portal, its kind of too big to paste here and has keys as well

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 31, 2020
@CecileRobertMichon
Copy link
Contributor Author

/approve

self-approving since 2 approvers have said lgtm :)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 31, 2020
@CecileRobertMichon
Copy link
Contributor Author

/retest

kind error

@nader-ziada
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e

@k8s-ci-robot k8s-ci-robot merged commit 3c0b4df into kubernetes-sigs:master Sep 1, 2020
@k8s-ci-robot k8s-ci-robot added this to the v0.4.8 milestone Sep 1, 2020
@CecileRobertMichon CecileRobertMichon deleted the boot-diagnostics branch July 9, 2021 18:39
nawazkh added a commit to nawazkh/cluster-api-provider-azure that referenced this pull request Dec 20, 2022
- when upgrading from v1alpha4 -> v1beta1, AzureMachine CRD gets updated upon cluster upgrade.
This upgrade updates AzureMachine.Spec.Diagnostics to "Managed", matching the existing behavior as implemented in kubernetes-sigs#901

- This PR enables upgrading nil AzureMachine.Spec.Diagnostic to "Managed" and unblocks v1alpha4 -> v1beta1 upgrades.
nawazkh added a commit to nawazkh/cluster-api-provider-azure that referenced this pull request Dec 21, 2022
- when upgrading from v1alpha4 -> v1beta1, AzureMachine CRD gets updated upon cluster upgrade.
This upgrade updates AzureMachine.Spec.Diagnostics to "Managed", matching the existing behavior as implemented in kubernetes-sigs#901

- This PR enables upgrading nil AzureMachine.Spec.Diagnostic to "Managed" and unblocks v1alpha4 -> v1beta1 upgrades.
- adds test cases to test validateUpdate() in case of nil AzureMachine.Spec.Diagnostics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enable vm bootdiagnostics
6 participants