Enable configuration and disabling of boot diagnostics #2401

JoelSpeed · 2022-06-20T12:19:08Z

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Currently, as per the implementation from #901, boot diagnostics are enabled by default and use an Azure managed storage account. This storage account limits the boot logs to 1Gb per machine and has a cost of $0.05/perGB/Month.

You can also use your own storage account with boot diagnostics which would allow you to set customer retention policies for the data and also have more access to the underlying files being created should you need to.

I would like to propose that we add configuration options so that a user can configure their own storage account if they need to, but then also have the ability to disable the diagnostics should they not be interested.

In large deployments, eg a managed service running thousands of Machines, this data is costly and likely not always required (how often do we expect boot failures?), so I think it would be good to have the choice to disable the diagnostics as well to save money in large deployments.

I would expect an API something along the lines of, but happy to have discussion if others have suggestions:

diagnostics:
  boot:
     storageAccountType: AzureManaged | CustomerManaged | Disabled # defaults to AzureManaged for backwards compatibility
     customerManaged: # This is only valid to be set when the account type is customer managed (a discriminated union)
       storageAccountName: <blah>

Happy to volunteer to implement this if this is something maintainers feel comfortable supporting.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

cluster-api-provider-azure version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

CecileRobertMichon · 2022-06-20T15:12:16Z

No objections from my part to making this configurable and enabled by default to preserve back compat.

For the API, we might want to have storageUri instead of storage account name since that's what the Azure APIs use. I would also avoid the terms "CustomerManaged" and "AzureManaged" and instead stick with the official terms, i.e. Managed and UserManaged https://docs.microsoft.com/en-us/azure/virtual-machines/boot-diagnostics.

The way the Azure API works is as follows, for reference:

// DiagnosticsProfile specifies the boot diagnostic settings state. <br><br>Minimum api-version:
// 2015-06-15.
type DiagnosticsProfile struct {
	// BootDiagnostics - Boot Diagnostics is a debugging feature which allows you to view Console Output and Screenshot to diagnose VM status. <br>**NOTE**: If storageUri is being specified then ensure that the storage account is in the same region and subscription as the VM. <br><br> You can easily view the output of your console log. <br><br> Azure also enables you to see a screenshot of the VM from the hypervisor.
	BootDiagnostics *BootDiagnostics `json:"bootDiagnostics,omitempty"`
}

// BootDiagnostics boot Diagnostics is a debugging feature which allows you to view Console Output and
// Screenshot to diagnose VM status. <br><br> You can easily view the output of your console log. <br><br>
// Azure also enables you to see a screenshot of the VM from the hypervisor.
type BootDiagnostics struct {
	// Enabled - Whether boot diagnostics should be enabled on the Virtual Machine.
	Enabled *bool `json:"enabled,omitempty"`
	// StorageURI - Uri of the storage account to use for placing the console output and screenshot. <br><br>If storageUri is not specified while enabling boot diagnostics, managed storage will be used.
	StorageURI *string `json:"storageUri,omitempty"`
}

JoelSpeed · 2022-06-20T15:47:59Z

That makes sense to me, on the StorageURI front, the only question I have is, since that's a common format https://<account-name>.blob.core.windows.net/, I was thinking we could make it easier for users by just doing that string substitution for them, rather than having them have to provider the full URI. Is that not something you would normally do in CAPZ?

CecileRobertMichon · 2022-06-20T15:57:44Z

It would be something we could do if we were confident no other formats were possible but given https://docs.microsoft.com/en-us/azure/storage/common/storage-account-overview#azure-dns-zone-endpoints-preview that may already not be true.

JoelSpeed · 2022-06-20T16:08:59Z

Cool, I'll look into working up a PR over the next few days

CecileRobertMichon · 2022-06-21T22:50:13Z

/assign @JoelSpeed

damdo · 2022-08-02T17:55:35Z

Following up from the discussion, here is a PR that implements the agreed design: #2528

k8s-triage-robot · 2022-10-31T18:51:03Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

JoelSpeed · 2022-11-01T20:25:36Z

/remove-lifecycle stale

This is waiting on review

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 20, 2022

k8s-ci-robot assigned JoelSpeed Jun 21, 2022

damdo mentioned this issue Aug 2, 2022

Add support for configurable VM Diagnostics #2528

Merged

3 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 1, 2022

k8s-ci-robot closed this as completed in #2528 Dec 14, 2022

nawazkh mentioned this issue Feb 27, 2023

panic in scalesets.go: boot diagnostics #3200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable configuration and disabling of boot diagnostics #2401

Enable configuration and disabling of boot diagnostics #2401

JoelSpeed commented Jun 20, 2022

CecileRobertMichon commented Jun 20, 2022

JoelSpeed commented Jun 20, 2022

CecileRobertMichon commented Jun 20, 2022

JoelSpeed commented Jun 20, 2022

CecileRobertMichon commented Jun 21, 2022

damdo commented Aug 2, 2022

k8s-triage-robot commented Oct 31, 2022

JoelSpeed commented Nov 1, 2022

Enable configuration and disabling of boot diagnostics #2401

Enable configuration and disabling of boot diagnostics #2401

Comments

JoelSpeed commented Jun 20, 2022

CecileRobertMichon commented Jun 20, 2022

JoelSpeed commented Jun 20, 2022

CecileRobertMichon commented Jun 20, 2022

JoelSpeed commented Jun 20, 2022

CecileRobertMichon commented Jun 21, 2022

damdo commented Aug 2, 2022

k8s-triage-robot commented Oct 31, 2022

JoelSpeed commented Nov 1, 2022