-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AzureManagedCluster][Spot] AzureManagedMachinePool
fluctuating between Running
and Provisioned
states
#4112
Comments
AzureManagedMachinePool
fluctuating between Running
and Provisioned
States
AzureManagedMachinePool
fluctuating between Running
and Provisioned
StatesAzureManagedMachinePool
fluctuating between Running
and Provisioned
states
/triage accepted |
@esierra-stratio With the log verbosity cranked up to at least 4 with the apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureManagedMachinePool
metadata:
name: spot
namespace: default
spec:
availabilityZones:
- "3"
mode: User
name: spot
sku: Standard_D2s_v3
scaleSetPriority: Spot
scaling:
minSize: 1
maxSize: 6
scaleDownMode: Delete
linuxOSConfig:
sysctls:
vmMaxMapCount: 262144
nodeLabels:
backup: "false"
osDiskSizeGB: 50
osDiskType: Managed
osType: Linux |
^ My mistake, I wasn't noticing that the machine pool wasn't getting created in the first place when I was trying to repro (looks like an AKS bug). Still working out the details, but do you see the fluctuating stop if you define |
Yep, looks like it's because the AKS API is populating
@CecileRobertMichon Do you think it would be better to default |
@nojnhuh I think it doesn't make sense to set Is this a change in the AKS API with the move to SDK v2? |
It looks like AKS only defaults that on Spot node pools.
I'm guessing this also affects v1.10 since spotMaxPrice was a pointer before. I don't remember ever testing this scenario where spotMaxPrice wasn't defined on a Spot node pool. |
Ah, spotMaxPrice was only added for v1.11: 6b96d7f |
I can confirm this bug did exist in the original PR before sdk v2. |
@CecileRobertMichon This bug doesn't exist in #4069 with ASO, so if we default spotMaxPrice in the webhook here, could we remove later? Or would that be a breaking change? I wonder if it might be marginally safer not to default it in CAPZ if we can in the mostly hypothetical scenario that AKS changes its sentinel "unlimited" value from "-1" to something else in a future API version. |
@nojnhuh is the ASO resource defaulting under the hood in a similar way to this new default behavior in this PR? If so then I don't think shipping this change, and then removing it in CAPZ as part of the ASO resource introduction would constitute a breaking change. Do we know why the ASO flow doesn't produce the flapping? |
this would be a breaking change in the AKS API, I would be surprised if that happens. If it does though, we have ways to handle it when we bump the hypothetical AKS API version that changes this. Having explicit defaults is overall safer, especially for fields that are immutable. My vote would be to just keep defaulting. |
The value defaulted by AKS will be reflected in the If a defaulted value for |
I was somewhat disconnected during the last few days of the previous week because I was on vacation, thanks for the effort! |
/kind bug
AzureManagedMachinePool
inAzureManagedCluster
with Spot enabled are experiencing continuous fluctuations between theRunning
andProvisioned
states.What steps did you take and what happened:
Deploy
AzureManagedCluster
with Provider Version 1.11.3Below is the configuration of one of the affected
AzureManagedMachinePools
:What did you expect to happen:
AzureManagedMachinePool
remains in theRunning
state when it successfully passesMachineHealthCheck
flights.Anything else you would like to add:
Some of the pieces of evidence are:
Additional information in there:
capz-controller-manager
:Environment:
kubectl version
): v1.27.0The text was updated successfully, but these errors were encountered: