-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Linux VM extension bootstrap script and conditions #1232
Add Linux VM extension bootstrap script and conditions #1232
Conversation
6828aea
to
1ddcd95
Compare
1ddcd95
to
0366cd4
Compare
Just realized it's a wip pr, please feel free to ignore the review comments that are not relevant. |
0366cd4
to
240c5b5
Compare
return err | ||
} | ||
_, err = future.Result(ac.vmextensions) | ||
_, err := ac.vmextensions.CreateOrUpdate(ctx, resourceGroupName, vmName, name, parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devigned thoughts on this approach of doing "async" reconciliation? Is it significantly less efficient than storing the future or since we only care about eventually getting the extension status do you think it's appropriate in this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking in the future we can move this over to VM CreateOrUpdate once that's async. For now the extension could be a long-running operation (timeout is 20 minutes) so we don't want that to be part of a blocking operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be fine. Though, it would be easy to call this multiple times if one is not tracking an ongoing operation via a secret. With the AMP future, we check to see if the operation is done prior to proceeding, which the Azure SDK for Go can help with. In this case, you'd be responsible for managing that state.
436e638
to
e700487
Compare
c861661
to
e7e6060
Compare
switch infrav1.ProvisioningState(provisioningState) { | ||
case infrav1.Succeeded: | ||
m.V(4).Info("extension provisioning state is succeeded", "vm extension", extensionName, "scale set", m.Name()) | ||
conditions.MarkTrue(m.AzureMachinePool, infrav1.BootstrapSucceededCondition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be nice to set conditions per VMSS VM, hopefully that gets easier with #819
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it's at the AzureMachinePool.Status level:
status:
conditions:
- lastTransitionTime: "2021-03-16T23:53:39Z"
status: "True"
type: BoostrapSucceeded
- lastTransitionTime: "2021-03-16T23:49:37Z"
status: "True"
type: ScaleSetRunning
Thanks for the initial review @shysank! I've addressed most of your comments. I still have docs and unit tests as TODOs. |
e7e6060
to
ad702de
Compare
dea6219
to
669b1c0
Compare
669b1c0
to
7d5b441
Compare
7d5b441
to
638c14f
Compare
@CecileRobertMichon: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This is ready for review now |
when I try running it, I don't see the
then goes directly to
not sure why yet, will dig into it a bit more update: I see this error in my logs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Only had a minor comment about async APIs.
return err | ||
} | ||
_, err = future.Result(ac.vmextensions) | ||
_, err := ac.vmextensions.CreateOrUpdate(ctx, resourceGroupName, vmName, name, parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be fine. Though, it would be easy to call this multiple times if one is not tracking an ongoing operation via a secret. With the AMP future, we check to see if the operation is done prior to proceeding, which the Azure SDK for Go can help with. In this case, you'd be responsible for managing that state.
@nader-ziada I wonder if you're doing something differently, this is what I see when scaling up:
That first step I'm hoping we can change to |
I tried it again and seems to work as expected
|
/lgtm @devigned any comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: devigned The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
What this PR does / why we need it: Implements bootstrap failure detection using conditions and VM extensions as proposed in #1076.
This is what a user will see when an AzureMachine has been successfully provisioned but has not yet finished running the bootstrap script:
After the bootstrap script has executed successfully, the AzureMachine status shows:
If for some reason the bootstrap script fails to execute, the status will show:
Also renames type "VMState" to "ProvisioningState" to be more generic and match Azure naming. This is a breaking change for anyone importing CAPZ types, but should have no impact on user templates and existing CRDs (conversion from v1alpha3 to v1alpha4 is auto-generated).
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #603
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: