-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap failure detection #603
Comments
We will need a way for the CAPZ controller to be able to determine the status for a given AzureMachine. Does Azure have any services where the bootstrap logic in the VM could send an update to an Azure service (such as a message queue) to indicate bootstrap success? Or maybe some way to label or tag the Azure VM? This would need to be properly secured, presumably. |
I think the credentials in azure.json (or MSI) should do the trick for worker authentication, assuming we setup RBAC correctly. There's an existing boot diagnostics option on VMs that streams cloud init + boot time logs into a plain text file in a storage account. Uploading a sentinel file that way or tagging a VM would probably be doable. If you're thinking more like a message queue (perhaps to build on later?), Service Bus is a pretty traditional pub/sub message broker and Event Grid is a more loosely coupled event source/sink model with handlers. edit: I now see the AWS discussion from the CAPA issue, missed it in the CAPI issue |
azure also supports the option to add a tag to the VM |
Whatever we do, we need to make sure that the information is available for inspection any time, and that CAPZ doesn't have to be running to receive a notification. I want to avoid a situation where bootstrapping fails, but CAPZ crashes/goes offline for whatever reason, and because it's down, we miss the notification that bootstrapping failed. It sounds like the options you suggested are all feasible, as long as messages are persisted until consumption. |
/assign |
@nader-ziada the tags are only visible to the Azure user, that might be one downside of using tags (ie. if we wanted to ever gather data on percentage of bootstrap failures/success). |
/priority important-soon |
/assign @jackfrancis Jack and I are going to start putting together a CAPZ proposal / design doc for different ways we can implement this in Azure. Will drop a link to a doc here for collaboration once it exists. |
@CecileRobertMichon: GitHub didn't allow me to assign the following users: jackfrancis. Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign |
@CecileRobertMichon and I have written up our investigation and conclusions here: https://docs.google.com/document/d/1U0GxvO6ltgIINMjpQz96UD4bExlN2h21wyyn-3ENezc I think next steps are to produce a concrete proposal for implemeting a capz-specific, named Azure VM Extension to include with capz-provisioned AzureMachine vms. |
@CecileRobertMichon is this still open? |
yes, the last part is to add the script in the extension that checks for the sentinel file and set conditions. |
/kind feature
Describe the solution you'd like
If bootstrapping doesn't succeed, set AzureMachine.Status.FailureReason and FailureMessage to indicate there was an error.
Anything else you would like to add:
More information in kubernetes-sigs/cluster-api#2554.
We may need to amend the bootstrap provider contract to require them to write a sentinel file indicating success to a specific location because not all bootstrap providers will necessarily use cloud-init and have that as a consistent means for checking success/failure. I will probably write up a separate proposal for that.
The text was updated successfully, but these errors were encountered: