diff --git a/docs/status.md b/docs/status.md index ec60dae..ba353fa 100644 --- a/docs/status.md +++ b/docs/status.md @@ -55,7 +55,9 @@ the full solution. Step 2 is very much dependent on step 1 - there are bits of data or annotations that are needed to automatically reason about which node is most suited to -becoming the prototype pattern node. +becoming the prototype pattern node. We have, a mechanism that is a +reasonable base implementation of Step 2 in the code based on the rules +with attributes as we have stated them. Step 3 is the part that no one (or no one we know of) has. This part can be used, in the worst case, by someone manually selecting the target node. @@ -68,6 +70,13 @@ Yes - our goal is to bring steps 1 and 2 as pluggable elements such that each of them could be replaced with their own implementations. Our goal is to have a basic reference implementation of all 3 components. +With the latest changes to the open source [Kured](https://github.com/weaveworks/kured) +tool, we now have a baseline of step 1 plus our [Kamino auto update](../helm/vmss-prototype/auto-update.md) for +step 2 and 3. + Our hope is that these components are composable and replaceable as needed. -Again, the big push with doing step 3 first is that we feel that is the -most critical and unique component right now. +Again, the big push was doing step 3 first in that we felt that was the +most critical and unique component. Having our step 2 code there and validated +against at least 2 implementations of step 1 (and internal system and now +the public [Kured](https://github.com/weaveworks/kured) project) gives us +confidence that Kamino is a viable first release operational and useful tool. diff --git a/helm/vmss-prototype/README.md b/helm/vmss-prototype/README.md index 7b5eae7..10f8845 100644 --- a/helm/vmss-prototype/README.md +++ b/helm/vmss-prototype/README.md @@ -66,13 +66,13 @@ The `vmss-prototype` operation carries out a procedural set of steps, each of wh 8. Cordon + drain the target node in preparation for taking it offline. If the cordon + drain fails, we will fail the operation _unless we pass in the `--force` option to the `vmss-prototype` tool (see the Helm Chart usage of `kamino.drain.force` below)_. 9. Deallocate the VMSS instance. This is a fancy, Azure-specific way of saying that we release the reservation of the underlying compute hardware running that instance virtual machine. This is a pre-condition to performing a snapshot of the underlying disk. 10. Make a snapshot of the OS disk image attached to the deallocated VMSS instance. -11. *Permanently delete the VMSS instance.* This is due to an [open issue](https://github.com/jackfrancis/kamino/issues/26). Long-term, we aim to solve that issue and simply re-introduce the snapshotted node back into the cluster. In the meanwhile, one operational side-effect of `vmss-prototype` is the loss of one node in the node pool. If you wish to re-add one node after `vmss-prototype` has completed updating the VMSS model, you may use the `--set kamino.newUpdatedNodes=1` option when invoking `helm install`. +11. Restart the node's VMSS instance that we just grabbed a snapshot of. 12. Uncordon the node to allow Kubernetes to schedule workloads onto it. 13. Remove the `cluster-autoscaler.kubernetes.io/scale-down-disabled` cluster-autoscaler node annotation as we no longer care if this node is chosen for removal by cluster-autoscaler. 14. Build a new SIG Image Definition _version_ (i.e., the actual image we're going to update the VMSS to use) from the recently captured snapshot image. This takes a long time! In our tests we see a 30 GB image (the OS disk size default for many Linux distros) take between 30 minutes and 2 _hours_ to be rendered as a SIG Image Definition version! 15. After the new SIG Image Definition version has been created, we delete the snapshot image as it will no longer be needed. 16. We now prune older SIG Image Definition versions (configurable, see the usage of `kamino.imageHistory` in the official Helm Chart docs below). -17. Update the target instance's VMSS model so that its OS image refers to the newly created SIG Image Definition version. This means that the very next instance built with this VMSS will derive from the newly created image. *This update operation does not affect existing instances: The `vmss-prototype` tool does not instruct the VMSS API to perform a "rolling upgrade" to ensure that all instances are running this new OS image! Similarly, `vmss-prototype` **will not** perform a "rolling upgrade" across the other, existing VMSS instances, nor will it create new, replacement instances, and delete old instances!* +17. Update the target instance's VMSS model so that its OS image refers to the newly created SIG Image Definition version. This means that the very next instance built with this VMSS will derive from the newly created image. *This update operation does not affect existing instances: The `vmss-prototype` tool does not instruct the VMSS API to perform a "rolling upgrade" to ensure that all instances are running this new OS image! Similarly, `vmss-prototype` **will not** perform a "rolling upgrade" across the other, existing VMSS instances, nor will it create new, replacement instances, or delete old instances!* 18. Update the target instance's cloud-init configuration so that it no longer includes "one-time bootstrap" configuration. Because this instance was _already_ bootstrapped when the cluster was created, we don't need to perform those various prerequisite file system operations: by updating the VMSS's OS image reference to a "post-bootstrapped" image, `vmss-prototype` has made it unnecessary for new instances to perform this cloud-init bootstrap overhead: our new nodes will come online more quickly! 19. Similarly, we remove any VMSS "Extensions" that were used to execute "one-time bootrap executable code" (i.e., all the stuff we execute to turn a vanilla Linux VM into a Kubernetes node running in a cluster), except for any "provenance-identifying" Extensions, e.g. "computeAksLinuxBilling". Similar to the cloud-init savings, `vmss-prototype` allows us to create new instances _already configured to come online immediately as Kubernetes nodes in this cluster!_ diff --git a/helm/vmss-prototype/auto-update.md b/helm/vmss-prototype/auto-update.md index 6eb9d28..06707f6 100644 --- a/helm/vmss-prototype/auto-update.md +++ b/helm/vmss-prototype/auto-update.md @@ -1,4 +1,4 @@ -# Manual Update +# Auto Update This is a higher level description of the basic functions of the VMSS-Prototype Pattern (Kamino) system. This goes into deep details as to what you are doing on the machine. @@ -81,7 +81,7 @@ Now we evaluate all of the nodes, filtering out those that are not valid candida 2021-01-14T17:10:13.384973400Z k8s-agentpool1-12345678-vmss INFO: ===> Executing command: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000000' '--resource-group' 'testCluster1' '--name' 'k8s-agentpool1-12345678-vmss' '--instance-ids' '2'] 2021-01-14T17:13:16.747177869Z k8s-agentpool1-12345678-vmss INFO: ===> Completed in 183.36s: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000000' '--resource-group' 'testCluster1' '--name' 'k8s-agentpool1-12345678-vmss' '--instance-ids' '2'] # RC=0 2021-01-14T17:13:16.747427571Z k8s-agentpool1-12345678-vmss INFO: ===> Executing command: ['az' 'snapshot' 'create' '--subscription' '00000000-0000-0000-0000-000000000000' '--resource-group' 'testCluster1' '--name' 'snapshot_k8s-agentpool1-12345678-vmss' '--source' '/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/testCluster1/providers/Microsoft.Compute/disks/k8s-agentpool1-18861k8s-agentpool1-188617OS__1_31149be82f654bdf8f25e77b8f64afae' '--tags' 'BuiltFrom=k8s-agentpool1-12345678-vmss000002' 'BuiltAt=2021-01-14 17:13:16.747072'] -2021-01-14T17:13:22.566227674Z k8s-agentpool1-12345678-vmss INFO: ===> Executing command: ['az' 'vmss' 'delete-instances' '--subscription' '00000000-0000-0000-0000-000000000000' '--resource-group' 'testCluster1' '--name' 'k8s-agentpool1-12345678-vmss' '--instance-ids' '2' '--no-wait'] +2021-01-14T17:13:22.566227674Z k8s-agentpool1-12345678-vmss INFO: ===> Executing command: ['az' 'vmss' 'start' '--subscription' '00000000-0000-0000-0000-000000000000' '--resource-group' 'testCluster1' '--name' 'k8s-agentpool1-12345678-vmss' '--instance-ids' '2' '--no-wait'] 2021-01-14T17:13:24.087514857Z k8s-agentpool1-12345678-vmss INFO: ===> Executing command: ['kubectl' 'uncordon' 'k8s-agentpool1-12345678-vmss000002'] 2021-01-14T17:13:24.212719270Z k8s-agentpool1-12345678-vmss INFO: ===> Executing command: ['kubectl' 'annotate' 'node' 'k8s-agentpool1-12345678-vmss000002' 'cluster-autoscaler.kubernetes.io/scale-down-disabled-'] 2021-01-14T17:13:24.341068604Z k8s-agentpool1-12345678-vmss INFO: Creating sig image version - this can take quite a long time... @@ -136,9 +136,6 @@ See how this whole thing completed in less than 15 seconds, including collecting ``` - - - # Logs from a 2 VMSS cluster run with automatic mode This cluster has tools that automatically apply OS patches and do reboots as needed, setting the annotations needed. There is a PR for the open source [Kured](https://github.com/weaveworks/kured) tool that will have it set these annotations as needed but my test environment did not have that custom build so we used our internal tool. @@ -148,96 +145,129 @@ This cluster has tools that automatically apply OS patches and do reboots as nee This run is just like the first run on a single pool cluster only it found 2 pools that have prototype candidates to process. Note how these run in parallel. Most of the real processing happens in Azure and this one was especially slow - must have had some significant activity within my test subscription at the time. ``` -2021-01-14T17:18:56.438975547Z CMD: ['/usr/bin/vmss-prototype' '--in-cluster' '--log-level' 'DEBUG' '--log-prefix' 'auto-update' 'auto-update' '--new-updated-nodes' '0' '--grace-period' '5' '--max-history' '3' '--last-patch-annotation' 'LatestOSPatch' '--pending-reboot-annotation' 'PendingReboot' '--minimum-ready-time' '1h' '--minimum-candidates' '1'] -2021-01-14T17:18:56.439046247Z auto-update INFO: ===> Executing command: ['az' 'cloud' 'set' '--name' 'AzureCloud'] -2021-01-14T17:18:59.397722557Z auto-update INFO: ===> Executing command: ['az' 'login' '***'] -2021-01-14T17:19:00.384324492Z auto-update INFO: ===> Executing command: ['az' 'account' 'set' '--subscription' '00000000-0000-0000-0000-000000000001'] -2021-01-14T17:19:00.768999817Z auto-update INFO: ===> Executing command: ['kubectl' 'get' 'nodes' '--output' 'jsonpath={.items[*].metadata.name}'] -2021-01-14T17:19:02.184152151Z auto-update INFO: ===> Executing command: ['kubectl' 'get' 'nodes' '--output' 'json'] -2021-01-14T17:19:02.363603030Z auto-update INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-98765432-vmss-prototype'] -2021-01-14T17:19:04.394680302Z auto-update DEBUG: Latest image for VMSS k8s-agentpool1-98765432-vmss is 0000.00.00 -2021-01-14T17:19:04.399800376Z auto-update DEBUG: IGNORED: Node k8s-agentpool1-98765432-vmss00000a has been ready for only 20s and the minimum is 3600s -2021-01-14T17:19:04.399822076Z auto-update DEBUG: IGNORED: Node k8s-agentpool2-98765432-vmss000009 not part of vmss k8s-agentpool1-98765432-vmss -2021-01-14T17:19:04.399890075Z auto-update DEBUG: IGNORED: Node k8s-agentpool2-98765432-vmss00000a not part of vmss k8s-agentpool1-98765432-vmss -2021-01-14T17:19:04.399982075Z auto-update DEBUG: IGNORED: Node k8s-agentpool2-98765432-vmss00000b not part of vmss k8s-agentpool1-98765432-vmss -2021-01-14T17:19:04.400072474Z auto-update DEBUG: IGNORED: Node k8s-master-98765432-0 not part of vmss k8s-agentpool1-98765432-vmss -2021-01-14T17:19:04.400085174Z auto-update DEBUG: IGNORED: Node k8s-master-98765432-1 not part of vmss k8s-agentpool1-98765432-vmss -2021-01-14T17:19:04.400171374Z auto-update DEBUG: IGNORED: Node k8s-master-98765432-2 not part of vmss k8s-agentpool1-98765432-vmss -2021-01-14T17:19:04.400189774Z auto-update INFO: VMSS k8s-agentpool1-98765432-vmss: Picked candiate node k8s-agentpool1-98765432-vmss000008 from 1 candidates -2021-01-14T17:19:04.400461772Z auto-update INFO: ===> Executing command: ['/usr/bin/vmss-prototype' '--log-level' 'DEBUG' '--log-prefix' 'k8s-agentpool1-98765432-vmss' 'update' '--resource-group' 'testCluster2' '--new-updated-nodes' '0' '--max-history' '3' '--grace-period' '5' '--target-node' 'k8s-agentpool1-98765432-vmss000008' '--subscription' '00000000-0000-0000-0000-000000000001'] -2021-01-14T17:19:04.400978070Z auto-update INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-98765432-vmss-prototype'] -2021-01-14T17:19:04.488966318Z CMD: ['/usr/bin/vmss-prototype' '--log-level' 'DEBUG' '--log-prefix' 'k8s-agentpool1-98765432-vmss' 'update' '--resource-group' 'testCluster2' '--new-updated-nodes' '0' '--max-history' '3' '--grace-period' '5' '--target-node' 'k8s-agentpool1-98765432-vmss000008' '--subscription' '00000000-0000-0000-0000-000000000001'] -2021-01-14T17:19:04.488999818Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['kubectl' 'get' 'node' 'k8s-agentpool1-98765432-vmss000008'] -2021-01-14T17:19:04.676290556Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2'] -2021-01-14T17:19:06.588209240Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--description' 'Kamino VMSS images'] -2021-01-14T17:19:07.588451705Z auto-update DEBUG: Latest image for VMSS k8s-agentpool2-98765432-vmss is 0000.00.00 -2021-01-14T17:19:07.588557804Z auto-update DEBUG: IGNORED: Node k8s-agentpool1-98765432-vmss000008 not part of vmss k8s-agentpool2-98765432-vmss -2021-01-14T17:19:07.588587904Z auto-update DEBUG: IGNORED: Node k8s-agentpool1-98765432-vmss00000a not part of vmss k8s-agentpool2-98765432-vmss -2021-01-14T17:19:07.589205301Z auto-update DEBUG: IGNORED: Node k8s-agentpool2-98765432-vmss00000b does not have a last patch annotation: LatestOSPatch -2021-01-14T17:19:07.589222201Z auto-update DEBUG: IGNORED: Node k8s-master-98765432-0 not part of vmss k8s-agentpool2-98765432-vmss -2021-01-14T17:19:07.589359000Z auto-update DEBUG: IGNORED: Node k8s-master-98765432-1 not part of vmss k8s-agentpool2-98765432-vmss -2021-01-14T17:19:07.589420800Z auto-update DEBUG: IGNORED: Node k8s-master-98765432-2 not part of vmss k8s-agentpool2-98765432-vmss -2021-01-14T17:19:07.589618599Z auto-update INFO: VMSS k8s-agentpool2-98765432-vmss: Picked candiate node k8s-agentpool2-98765432-vmss000009 from 2 candidates -2021-01-14T17:19:07.589950997Z auto-update INFO: ===> Executing command: ['/usr/bin/vmss-prototype' '--log-level' 'DEBUG' '--log-prefix' 'k8s-agentpool2-98765432-vmss' 'update' '--resource-group' 'testCluster2' '--new-updated-nodes' '0' '--max-history' '3' '--grace-period' '5' '--target-node' 'k8s-agentpool2-98765432-vmss000009' '--subscription' '00000000-0000-0000-0000-000000000001'] -2021-01-14T17:19:07.660802133Z CMD: ['/usr/bin/vmss-prototype' '--log-level' 'DEBUG' '--log-prefix' 'k8s-agentpool2-98765432-vmss' 'update' '--resource-group' 'testCluster2' '--new-updated-nodes' '0' '--max-history' '3' '--grace-period' '5' '--target-node' 'k8s-agentpool2-98765432-vmss000009' '--subscription' '00000000-0000-0000-0000-000000000001'] -2021-01-14T17:19:07.660842033Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['kubectl' 'get' 'node' 'k8s-agentpool2-98765432-vmss000009'] -2021-01-14T17:19:07.813644349Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2'] -2021-01-14T17:19:09.384061886Z k8s-agentpool2-98765432-vmss INFO: Processing VMSS k8s-agentpool2-98765432-vmss -2021-01-14T17:19:09.384511983Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-definition' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-98765432-vmss-prototype'] -2021-01-14T17:19:10.871751248Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-definition' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-98765432-vmss-prototype' '--publisher' 'VMSS-Prototype-Pattern' '--offer' 'testCluster2' '--sku' 'k8s-agentpool2-98765432-vmss' '--os-type' 'Linux' '--os-state' 'generalized'] -2021-01-14T17:19:40.569567067Z k8s-agentpool1-98765432-vmss INFO: Processing VMSS k8s-agentpool1-98765432-vmss -2021-01-14T17:19:40.569802566Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-definition' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-98765432-vmss-prototype'] -2021-01-14T17:19:42.115806228Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-definition' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-98765432-vmss-prototype' '--publisher' 'VMSS-Prototype-Pattern' '--offer' 'testCluster2' '--sku' 'k8s-agentpool1-98765432-vmss' '--os-type' 'Linux' '--os-state' 'generalized'] -2021-01-14T17:19:44.743378836Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-98765432-vmss-prototype'] -2021-01-14T17:19:46.607442965Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'snapshot' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool2-98765432-vmss'] -2021-01-14T17:19:47.983195601Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-98765432-vmss' '--instance-id' '9'] -2021-01-14T17:19:49.492404751Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['kubectl' 'annotate' 'node' 'k8s-agentpool2-98765432-vmss000009' 'cluster-autoscaler.kubernetes.io/scale-down-disabled=true' '--overwrite'] -2021-01-14T17:19:49.663973270Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['kubectl' 'cordon' 'k8s-agentpool2-98765432-vmss000009'] -2021-01-14T17:19:49.824122448Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['kubectl' 'drain' '--ignore-daemonsets' '--delete-local-data' '--force' '--grace-period' '5' '--timeout' '15s' 'k8s-agentpool2-98765432-vmss000009'] -2021-01-14T17:20:06.040992877Z k8s-agentpool2-98765432-vmss INFO: ===> Completed in 16.22s: ['kubectl' 'drain' '--ignore-daemonsets' '--delete-local-data' '--force' '--grace-period' '5' '--timeout' '15s' 'k8s-agentpool2-98765432-vmss000009'] # RC=0 -2021-01-14T17:20:06.041202276Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-98765432-vmss' '--instance-ids' '9'] -2021-01-14T17:20:15.741634264Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-98765432-vmss-prototype'] -2021-01-14T17:20:17.560708123Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'snapshot' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool1-98765432-vmss'] -2021-01-14T17:20:18.897144860Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-98765432-vmss' '--instance-id' '8'] -2021-01-14T17:20:20.486635498Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['kubectl' 'annotate' 'node' 'k8s-agentpool1-98765432-vmss000008' 'cluster-autoscaler.kubernetes.io/scale-down-disabled=true' '--overwrite'] -2021-01-14T17:20:20.618506821Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['kubectl' 'cordon' 'k8s-agentpool1-98765432-vmss000008'] -2021-01-14T17:20:20.751892736Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['kubectl' 'drain' '--ignore-daemonsets' '--delete-local-data' '--force' '--grace-period' '5' '--timeout' '15s' 'k8s-agentpool1-98765432-vmss000008'] -2021-01-14T17:20:36.709331591Z k8s-agentpool1-98765432-vmss INFO: ===> Completed in 15.96s: ['kubectl' 'drain' '--ignore-daemonsets' '--delete-local-data' '--force' '--grace-period' '5' '--timeout' '15s' 'k8s-agentpool1-98765432-vmss000008'] # RC=0 -2021-01-14T17:20:36.710308586Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-98765432-vmss' '--instance-ids' '8'] -2021-01-14T17:22:38.977292908Z k8s-agentpool2-98765432-vmss INFO: ===> Completed in 152.94s: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-98765432-vmss' '--instance-ids' '9'] # RC=0 -2021-01-14T17:22:38.977719805Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'snapshot' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool2-98765432-vmss' '--source' '/subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/disks/k8s-agentpool2-25241k8s-agentpool2-252419OS__1_2f8c0e732ea14af6aff4177d8c746d05' '--tags' 'BuiltFrom=k8s-agentpool2-98765432-vmss000009' 'BuiltAt=2021-01-14 17:22:38.977176'] -2021-01-14T17:22:44.800534194Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'delete-instances' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-98765432-vmss' '--instance-ids' '9' '--no-wait'] -2021-01-14T17:22:46.343851666Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['kubectl' 'uncordon' 'k8s-agentpool2-98765432-vmss000009'] -2021-01-14T17:22:46.493289898Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['kubectl' 'annotate' 'node' 'k8s-agentpool2-98765432-vmss000009' 'cluster-autoscaler.kubernetes.io/scale-down-disabled-'] -2021-01-14T17:22:46.647673405Z k8s-agentpool2-98765432-vmss INFO: Creating sig image version - this can take quite a long time... -2021-01-14T17:22:46.647949304Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-98765432-vmss-prototype' '--gallery-image-version' '2021.01.14' '--replica-count' '3' '--os-snapshot' 'snapshot_k8s-agentpool2-98765432-vmss' '--tags' 'BuiltFrom=k8s-agentpool2-98765432-vmss000009' 'BuiltAt=2021-01-14 17:22:38.977176' '--storage-account-type' 'Standard_ZRS'] -2021-01-14T17:23:09.415517144Z k8s-agentpool1-98765432-vmss INFO: ===> Completed in 152.71s: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-98765432-vmss' '--instance-ids' '8'] # RC=0 -2021-01-14T17:23:09.415769242Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'snapshot' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool1-98765432-vmss' '--source' '/subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/disks/k8s-agentpool1-25241k8s-agentpool1-252419OS__1_3f1fd62025714807acb50c6bca6f8414' '--tags' 'BuiltFrom=k8s-agentpool1-98765432-vmss000008' 'BuiltAt=2021-01-14 17:23:09.415412'] -2021-01-14T17:23:14.692467034Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'delete-instances' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-98765432-vmss' '--instance-ids' '8' '--no-wait'] -2021-01-14T17:23:16.156635412Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['kubectl' 'uncordon' 'k8s-agentpool1-98765432-vmss000008'] -2021-01-14T17:23:16.322424261Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['kubectl' 'annotate' 'node' 'k8s-agentpool1-98765432-vmss000008' 'cluster-autoscaler.kubernetes.io/scale-down-disabled-'] -2021-01-14T17:23:16.461581046Z k8s-agentpool1-98765432-vmss INFO: Creating sig image version - this can take quite a long time... -2021-01-14T17:23:16.461918444Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-98765432-vmss-prototype' '--gallery-image-version' '2021.01.14' '--replica-count' '3' '--os-snapshot' 'snapshot_k8s-agentpool1-98765432-vmss' '--tags' 'BuiltFrom=k8s-agentpool1-98765432-vmss000008' 'BuiltAt=2021-01-14 17:23:09.415412' '--storage-account-type' 'Standard_ZRS'] -2021-01-14T19:28:01.376992211Z k8s-agentpool2-98765432-vmss INFO: ===> Completed in 7514.73s: ['az' 'sig' 'image-version' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-98765432-vmss-prototype' '--gallery-image-version' '2021.01.14' '--replica-count' '3' '--os-snapshot' 'snapshot_k8s-agentpool2-98765432-vmss' '--tags' 'BuiltFrom=k8s-agentpool2-98765432-vmss000009' 'BuiltAt=2021-01-14 17:22:38.977176' '--storage-account-type' 'Standard_ZRS'] # RC=0 -2021-01-14T19:28:01.377231610Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'snapshot' 'delete' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool2-98765432-vmss'] -2021-01-14T19:28:33.005925221Z k8s-agentpool2-98765432-vmss INFO: Latest image: /subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/galleries/SIG_testCluster2/images/kamino-k8s-agentpool2-98765432-vmss-prototype/versions/2021.01.14 -2021-01-14T19:28:33.006624417Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-98765432-vmss-prototype'] -2021-01-14T19:28:35.599908237Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-98765432-vmss'] -2021-01-14T19:28:37.046549773Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'update' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-98765432-vmss' '--set' 'virtualMachineProfile.storageProfile.imageReference.id=/subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/galleries/SIG_testCluster2/images/kamino-k8s-agentpool2-98765432-vmss-prototype' 'virtualMachineProfile.storageProfile.imageReference.sku=null' 'virtualMachineProfile.storageProfile.imageReference.offer=null' 'virtualMachineProfile.storageProfile.imageReference.publisher=null' 'virtualMachineProfile.storageProfile.imageReference.version=null' 'virtualMachineProfile.osProfile.customData=I2Nsb3VkLWNvbmZpZwo='] -2021-01-14T19:28:49.809322404Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-98765432-vmss'] -2021-01-14T19:28:51.314872103Z k8s-agentpool2-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--vmss-name' 'k8s-agentpool2-98765432-vmss'] -2021-01-14T20:56:38.727277567Z k8s-agentpool1-98765432-vmss INFO: ===> Completed in 12802.26s: ['az' 'sig' 'image-version' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-98765432-vmss-prototype' '--gallery-image-version' '2021.01.14' '--replica-count' '3' '--os-snapshot' 'snapshot_k8s-agentpool1-98765432-vmss' '--tags' 'BuiltFrom=k8s-agentpool1-98765432-vmss000008' 'BuiltAt=2021-01-14 17:23:09.415412' '--storage-account-type' 'Standard_ZRS'] # RC=0 -2021-01-14T20:56:38.727488666Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'snapshot' 'delete' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool1-98765432-vmss'] -2021-01-14T20:57:11.427005726Z k8s-agentpool1-98765432-vmss INFO: Latest image: /subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/galleries/SIG_testCluster2/images/kamino-k8s-agentpool1-98765432-vmss-prototype/versions/2021.01.14 -2021-01-14T20:57:11.428335219Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-98765432-vmss-prototype'] -2021-01-14T20:57:14.189644036Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-98765432-vmss'] -2021-01-14T20:57:15.784717648Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'update' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-98765432-vmss' '--set' 'virtualMachineProfile.storageProfile.imageReference.id=/subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/galleries/SIG_testCluster2/images/kamino-k8s-agentpool1-98765432-vmss-prototype' 'virtualMachineProfile.storageProfile.imageReference.sku=null' 'virtualMachineProfile.storageProfile.imageReference.offer=null' 'virtualMachineProfile.storageProfile.imageReference.publisher=null' 'virtualMachineProfile.storageProfile.imageReference.version=null' 'virtualMachineProfile.osProfile.customData=I2Nsb3VkLWNvbmZpZwo='] -2021-01-14T20:57:28.722609171Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-98765432-vmss'] -2021-01-14T20:57:30.235270184Z k8s-agentpool1-98765432-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--vmss-name' 'k8s-agentpool1-98765432-vmss'] -2021-01-14T20:57:31.820532543Z auto-update INFO: ===> Executing command: ['/usr/bin/vmss-prototype' 'status' '--resource-group' 'testCluster2' '--subscription' '00000000-0000-0000-0000-000000000001'] -2021-01-14T20:57:43.674497253Z VMSS Prototype Status for cluster: -2021-01-14T20:57:43.674569253Z testCluster2: k8s-agentpool1-98765432-vmss: VMSS Prototype Image Version 2021.01.14 - Succeeded - BuiltFrom: k8s-agentpool1-98765432-vmss000008 @ 2021-01-14 17:23:09.415412 -2021-01-14T20:57:43.674583653Z testCluster2: k8s-agentpool1-98765432-vmss: Configured to use latest VMSS Prototype Image Definition -2021-01-14T20:57:43.674592653Z testCluster2: k8s-agentpool2-98765432-vmss: VMSS Prototype Image Version 2021.01.14 - Succeeded - BuiltFrom: k8s-agentpool2-98765432-vmss000009 @ 2021-01-14 17:22:38.977176 -2021-01-14T20:57:43.674604853Z testCluster2: k8s-agentpool2-98765432-vmss: Configured to use latest VMSS Prototype Image Definition +2021-04-01T21:27:29.616180727Z CMD: ['/usr/bin/vmss-prototype' '--in-cluster' '--log-level' 'DEBUG' '--log-prefix' 'auto-update' 'auto-update' '--new-updated-nodes' '0' '--grace-period' '5' '--max-history' '3' '--last-patch-annotation' 'LatestOSPatch' '--pending-reboot-annotation' 'PendingReboot' '--minimum-ready-time' '5m' '--minimum-candidates' '1' '--maximum-image-age' '15'] +2021-04-01T21:27:29.616212928Z auto-update INFO: ===> Executing command: ['az' 'cloud' 'set' '--name' 'AzureCloud'] +2021-04-01T21:27:31.824017020Z auto-update INFO: ===> Executing command: ['az' 'login' '***'] +2021-04-01T21:27:32.789011628Z auto-update INFO: ===> Executing command: ['az' 'account' 'set' '--subscription' '00000000-0000-0000-0000-000000000001'] +2021-04-01T21:27:32.991444516Z auto-update INFO: ===> Executing command: ['kubectl' 'get' 'nodes' '--output' 'jsonpath={.items[*].metadata.name}'] +2021-04-01T21:27:33.657669815Z auto-update INFO: ===> Executing command: ['kubectl' 'get' 'nodes' '--output' 'json'] +2021-04-01T21:27:33.762430251Z auto-update INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-33778956-vmss-prototype'] +2021-04-01T21:27:39.290835849Z auto-update DEBUG: Latest image for VMSS k8s-agentpool1-33778956-vmss is 0000.00.00 +2021-04-01T21:27:39.292895703Z auto-update DEBUG: CANDIDATE: Node k8s-agentpool1-33778956-vmss000000 ready for 1706s +2021-04-01T21:27:39.292927004Z auto-update DEBUG: CANDIDATE: Node k8s-agentpool1-33778956-vmss000001 ready for 2007s +2021-04-01T21:27:39.292941404Z auto-update DEBUG: IGNORED: Node k8s-agentpool1-33778956-vmss000002 is the same as the node we are running on +2021-04-01T21:27:39.292944904Z auto-update DEBUG: IGNORED: Node k8s-agentpool2-33778956-vmss000000 not part of vmss k8s-agentpool1-33778956-vmss +2021-04-01T21:27:39.292980705Z auto-update DEBUG: IGNORED: Node k8s-agentpool2-33778956-vmss000001 not part of vmss k8s-agentpool1-33778956-vmss +2021-04-01T21:27:39.292997706Z auto-update DEBUG: IGNORED: Node k8s-agentpool2-33778956-vmss000002 not part of vmss k8s-agentpool1-33778956-vmss +2021-04-01T21:27:39.293042607Z auto-update DEBUG: IGNORED: Node k8s-master-33778956-0 not part of vmss k8s-agentpool1-33778956-vmss +2021-04-01T21:27:39.293052107Z auto-update DEBUG: IGNORED: Node k8s-master-33778956-1 not part of vmss k8s-agentpool1-33778956-vmss +2021-04-01T21:27:39.293055307Z auto-update DEBUG: IGNORED: Node k8s-master-33778956-2 not part of vmss k8s-agentpool1-33778956-vmss +2021-04-01T21:27:39.293094508Z auto-update INFO: VMSS k8s-agentpool1-33778956-vmss: Picked candiate node k8s-agentpool1-33778956-vmss000001 from 2 candidates +2021-04-01T21:27:39.293262613Z auto-update INFO: ===> Executing command: ['/usr/bin/vmss-prototype' '--log-level' 'DEBUG' '--log-prefix' 'k8s-agentpool1-33778956-vmss' 'update' '--resource-group' 'testCluster2' '--new-updated-nodes' '0' '--max-history' '3' '--grace-period' '5' '--target-node' 'k8s-agentpool1-33778956-vmss000001' '--subscription' '00000000-0000-0000-0000-000000000001'] +2021-04-01T21:27:39.293562921Z auto-update INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-33778956-vmss-prototype'] +2021-04-01T21:27:39.330095074Z CMD: ['/usr/bin/vmss-prototype' '--log-level' 'DEBUG' '--log-prefix' 'k8s-agentpool1-33778956-vmss' 'update' '--resource-group' 'testCluster2' '--new-updated-nodes' '0' '--max-history' '3' '--grace-period' '5' '--target-node' 'k8s-agentpool1-33778956-vmss000001' '--subscription' '00000000-0000-0000-0000-000000000001'] +2021-04-01T21:27:39.330122074Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'node' 'k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:27:39.412174215Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2'] +2021-04-01T21:27:40.584345890Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--description' 'Kamino VMSS images'] +2021-04-01T21:27:44.163504306Z auto-update DEBUG: Latest image for VMSS k8s-agentpool2-33778956-vmss is 0000.00.00 +2021-04-01T21:27:44.163537507Z auto-update DEBUG: IGNORED: Node k8s-agentpool1-33778956-vmss000000 not part of vmss k8s-agentpool2-33778956-vmss +2021-04-01T21:27:44.163542307Z auto-update DEBUG: IGNORED: Node k8s-agentpool1-33778956-vmss000001 not part of vmss k8s-agentpool2-33778956-vmss +2021-04-01T21:27:44.163546107Z auto-update DEBUG: IGNORED: Node k8s-agentpool1-33778956-vmss000002 not part of vmss k8s-agentpool2-33778956-vmss +2021-04-01T21:27:44.163815814Z auto-update DEBUG: CANDIDATE: Node k8s-agentpool2-33778956-vmss000000 ready for 2096s +2021-04-01T21:27:44.163830915Z auto-update DEBUG: CANDIDATE: Node k8s-agentpool2-33778956-vmss000001 ready for 1735s +2021-04-01T21:27:44.163908317Z auto-update DEBUG: CANDIDATE: Node k8s-agentpool2-33778956-vmss000002 ready for 1584s +2021-04-01T21:27:44.163913417Z auto-update DEBUG: IGNORED: Node k8s-master-33778956-0 not part of vmss k8s-agentpool2-33778956-vmss +2021-04-01T21:27:44.163915917Z auto-update DEBUG: IGNORED: Node k8s-master-33778956-1 not part of vmss k8s-agentpool2-33778956-vmss +2021-04-01T21:27:44.163934217Z auto-update DEBUG: IGNORED: Node k8s-master-33778956-2 not part of vmss k8s-agentpool2-33778956-vmss +2021-04-01T21:27:44.163998119Z auto-update INFO: VMSS k8s-agentpool2-33778956-vmss: Picked candiate node k8s-agentpool2-33778956-vmss000000 from 3 candidates +2021-04-01T21:27:44.164280726Z auto-update INFO: ===> Executing command: ['/usr/bin/vmss-prototype' '--log-level' 'DEBUG' '--log-prefix' 'k8s-agentpool2-33778956-vmss' 'update' '--resource-group' 'testCluster2' '--new-updated-nodes' '0' '--max-history' '3' '--grace-period' '5' '--target-node' 'k8s-agentpool2-33778956-vmss000000' '--subscription' '00000000-0000-0000-0000-000000000001'] +2021-04-01T21:27:44.200995983Z CMD: ['/usr/bin/vmss-prototype' '--log-level' 'DEBUG' '--log-prefix' 'k8s-agentpool2-33778956-vmss' 'update' '--resource-group' 'testCluster2' '--new-updated-nodes' '0' '--max-history' '3' '--grace-period' '5' '--target-node' 'k8s-agentpool2-33778956-vmss000000' '--subscription' '00000000-0000-0000-0000-000000000001'] +2021-04-01T21:27:44.201033184Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'node' 'k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:27:44.276734157Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2'] +2021-04-01T21:27:45.583577913Z k8s-agentpool2-33778956-vmss INFO: Processing VMSS k8s-agentpool2-33778956-vmss +2021-04-01T21:27:45.583735517Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-definition' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-33778956-vmss-prototype'] +2021-04-01T21:27:46.917185158Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-definition' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-33778956-vmss-prototype' '--publisher' 'VMSS-Prototype-Pattern' '--offer' 'testCluster2' '--sku' 'k8s-agentpool2-33778956-vmss' '--os-type' 'Linux' '--os-state' 'generalized'] +2021-04-01T21:28:14.376147106Z k8s-agentpool1-33778956-vmss INFO: Processing VMSS k8s-agentpool1-33778956-vmss +2021-04-01T21:28:14.376225808Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-definition' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-33778956-vmss-prototype'] +2021-04-01T21:28:15.317867417Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-definition' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-33778956-vmss-prototype' '--publisher' 'VMSS-Prototype-Pattern' '--offer' 'testCluster2' '--sku' 'k8s-agentpool1-33778956-vmss' '--os-type' 'Linux' '--os-state' 'generalized'] +2021-04-01T21:28:20.673342370Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-33778956-vmss-prototype'] +2021-04-01T21:28:24.046707807Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'snapshot' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool2-33778956-vmss'] +2021-04-01T21:28:25.053591166Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-33778956-vmss' '--instance-id' '0'] +2021-04-01T21:28:26.390401057Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'annotate' 'node' 'k8s-agentpool2-33778956-vmss000000' 'cluster-autoscaler.kubernetes.io/scale-down-disabled=true' '--overwrite'] +2021-04-01T21:28:26.479078051Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'cordon' 'k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:28:26.563964648Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'drain' '--ignore-daemonsets' '--delete-local-data' '--force' '--grace-period' '5' '--timeout' '15s' 'k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:28:40.504320825Z k8s-agentpool2-33778956-vmss INFO: ===> Completed in 13.94s: ['kubectl' 'drain' '--ignore-daemonsets' '--delete-local-data' '--force' '--grace-period' '5' '--timeout' '15s' 'k8s-agentpool2-33778956-vmss000000'] # RC=0 +2021-04-01T21:28:40.504444729Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'delete' '-f' '-'] # Deleting vmss-prototype-tweaks-k8s-agentpool2-33778956-vmss000000 (just in case) +2021-04-01T21:28:40.584632699Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'create' '-f' '-'] # Creating vmss-prototype-tweaks-k8s-agentpool2-33778956-vmss000000 +2021-04-01T21:28:41.275190425Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:28:47.713661644Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-33778956-vmss-prototype'] +2021-04-01T21:28:49.361687639Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:28:57.447647991Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:29:00.863528737Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'snapshot' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool1-33778956-vmss'] +2021-04-01T21:29:01.863556175Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-33778956-vmss' '--instance-id' '1'] +2021-04-01T21:29:02.944568794Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'annotate' 'node' 'k8s-agentpool1-33778956-vmss000001' 'cluster-autoscaler.kubernetes.io/scale-down-disabled=true' '--overwrite'] +2021-04-01T21:29:03.033361379Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'cordon' 'k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:29:03.121895957Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'drain' '--ignore-daemonsets' '--delete-local-data' '--force' '--grace-period' '5' '--timeout' '15s' 'k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:29:05.531483249Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:29:13.619498082Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:29:19.154865560Z k8s-agentpool1-33778956-vmss WARNING: Attempt 1: Command failed with exit code 1. Retrying in 8s ... +2021-04-01T21:29:21.702121052Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:29:27.159340275Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'drain' '--ignore-daemonsets' '--delete-local-data' '--force' '--grace-period' '5' '--timeout' '15s' 'k8s-agentpool1-33778956-vmss000001'] # ... try #2 +2021-04-01T21:29:27.668187226Z k8s-agentpool1-33778956-vmss INFO: ===> Completed in 24.55s: ['kubectl' 'drain' '--ignore-daemonsets' '--delete-local-data' '--force' '--grace-period' '5' '--timeout' '15s' 'k8s-agentpool1-33778956-vmss000001'] # RC=0 +2021-04-01T21:29:27.668225227Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'delete' '-f' '-'] # Deleting vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001 (just in case) +2021-04-01T21:29:27.739045743Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'create' '-f' '-'] # Creating vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001 +2021-04-01T21:29:28.008288549Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:29:29.787488174Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'delete' 'pods' '--namespace' 'default' '--grace-period' '0' '--force' 'vmss-prototype-tweaks-k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:29:29.892609769Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-33778956-vmss' '--instance-ids' '0'] +2021-04-01T21:29:36.092502476Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:29:42.851731280Z k8s-agentpool2-33778956-vmss WARNING: Attempt 1: Command failed with exit code 1. Retrying in 8s ... +2021-04-01T21:29:44.177357713Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:29:50.860212897Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-33778956-vmss' '--instance-ids' '0'] # ... try #2 +2021-04-01T21:29:52.260737513Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:30:00.345906466Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:30:08.433159384Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:30:16.520461222Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:30:24.606762859Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'get' 'pod' '--namespace' 'default' '--output' 'jsonpath={.status.phase}' 'vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:30:32.694892671Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'delete' 'pods' '--namespace' 'default' '--grace-period' '0' '--force' 'vmss-prototype-tweaks-k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:30:32.780243344Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-33778956-vmss' '--instance-ids' '1'] +2021-04-01T21:31:23.482551006Z k8s-agentpool2-33778956-vmss INFO: ===> Completed in 113.59s: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-33778956-vmss' '--instance-ids' '0'] # RC=0 +2021-04-01T21:31:23.482602008Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'snapshot' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool2-33778956-vmss' '--source' '/subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/disks/k8s-agentpool2-33778k8s-agentpool2-337789OS__1_5d549e004ca5490e84c82a86f77dcb54' '--tags' 'BuiltFrom=k8s-agentpool2-33778956-vmss000000' 'BuiltAt=2021-04-01 21:31:23.482360'] +2021-04-01T21:32:28.599531814Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'start' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-33778956-vmss' '--instance-ids' '0' '--no-wait'] +2021-04-01T21:32:29.782487153Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'uncordon' 'k8s-agentpool2-33778956-vmss000000'] +2021-04-01T21:32:29.876285219Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['kubectl' 'annotate' 'node' 'k8s-agentpool2-33778956-vmss000000' 'cluster-autoscaler.kubernetes.io/scale-down-disabled-'] +2021-04-01T21:32:29.965644673Z k8s-agentpool2-33778956-vmss INFO: Creating sig image version - this can take quite a long time... +2021-04-01T21:32:29.965780377Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-33778956-vmss-prototype' '--gallery-image-version' '2021.04.01' '--replica-count' '3' '--os-snapshot' 'snapshot_k8s-agentpool2-33778956-vmss' '--tags' 'BuiltFrom=k8s-agentpool2-33778956-vmss000000' 'BuiltAt=2021-04-01 21:31:23.482360' '--storage-account-type' 'Standard_ZRS'] +2021-04-01T21:32:35.083747047Z k8s-agentpool1-33778956-vmss INFO: ===> Completed in 122.30s: ['az' 'vmss' 'deallocate' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-33778956-vmss' '--instance-ids' '1'] # RC=0 +2021-04-01T21:32:35.083828849Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'snapshot' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool1-33778956-vmss' '--source' '/subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/disks/k8s-agentpool1-33778k8s-agentpool1-337789OS__1_33c78c2ff87b4bbcad5516f88fab16c1' '--tags' 'BuiltFrom=k8s-agentpool1-33778956-vmss000001' 'BuiltAt=2021-04-01 21:32:35.083641'] +2021-04-01T21:32:43.910981266Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'start' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-33778956-vmss' '--instance-ids' '1' '--no-wait'] +2021-04-01T21:32:47.055532206Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'uncordon' 'k8s-agentpool1-33778956-vmss000001'] +2021-04-01T21:32:47.143864731Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['kubectl' 'annotate' 'node' 'k8s-agentpool1-33778956-vmss000001' 'cluster-autoscaler.kubernetes.io/scale-down-disabled-'] +2021-04-01T21:32:47.233459689Z k8s-agentpool1-33778956-vmss INFO: Creating sig image version - this can take quite a long time... +2021-04-01T21:32:47.233523790Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-33778956-vmss-prototype' '--gallery-image-version' '2021.04.01' '--replica-count' '3' '--os-snapshot' 'snapshot_k8s-agentpool1-33778956-vmss' '--tags' 'BuiltFrom=k8s-agentpool1-33778956-vmss000001' 'BuiltAt=2021-04-01 21:32:35.083641' '--storage-account-type' 'Standard_ZRS'] +2021-04-01T21:43:34.575631298Z k8s-agentpool2-33778956-vmss INFO: ===> Completed in 664.61s: ['az' 'sig' 'image-version' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-33778956-vmss-prototype' '--gallery-image-version' '2021.04.01' '--replica-count' '3' '--os-snapshot' 'snapshot_k8s-agentpool2-33778956-vmss' '--tags' 'BuiltFrom=k8s-agentpool2-33778956-vmss000000' 'BuiltAt=2021-04-01 21:31:23.482360' '--storage-account-type' 'Standard_ZRS'] # RC=0 +2021-04-01T21:43:34.575683699Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'snapshot' 'delete' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool2-33778956-vmss'] +2021-04-01T21:43:51.601287988Z k8s-agentpool1-33778956-vmss INFO: ===> Completed in 664.37s: ['az' 'sig' 'image-version' 'create' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-33778956-vmss-prototype' '--gallery-image-version' '2021.04.01' '--replica-count' '3' '--os-snapshot' 'snapshot_k8s-agentpool1-33778956-vmss' '--tags' 'BuiltFrom=k8s-agentpool1-33778956-vmss000001' 'BuiltAt=2021-04-01 21:32:35.083641' '--storage-account-type' 'Standard_ZRS'] # RC=0 +2021-04-01T21:43:51.601376391Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'snapshot' 'delete' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'snapshot_k8s-agentpool1-33778956-vmss'] +2021-04-01T21:44:05.711208309Z k8s-agentpool2-33778956-vmss INFO: Latest image: /subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/galleries/SIG_testCluster2/images/kamino-k8s-agentpool2-33778956-vmss-prototype/versions/2021.04.01 +2021-04-01T21:44:05.711379114Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool2-33778956-vmss-prototype'] +2021-04-01T21:44:18.030317600Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-33778956-vmss'] +2021-04-01T21:44:19.028627080Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'update' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-33778956-vmss' '--set' 'virtualMachineProfile.storageProfile.imageReference.id=/subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/galleries/SIG_testCluster2/images/kamino-k8s-agentpool2-33778956-vmss-prototype' 'virtualMachineProfile.storageProfile.imageReference.sku=null' 'virtualMachineProfile.storageProfile.imageReference.offer=null' 'virtualMachineProfile.storageProfile.imageReference.publisher=null' 'virtualMachineProfile.storageProfile.imageReference.version=null' 'virtualMachineProfile.osProfile.customData=I2Nsb3VkLWNvbmZpZwo='] +2021-04-01T21:44:22.983444364Z k8s-agentpool1-33778956-vmss INFO: Latest image: /subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/galleries/SIG_testCluster2/images/kamino-k8s-agentpool1-33778956-vmss-prototype/versions/2021.04.01 +2021-04-01T21:44:22.983580667Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'sig' 'image-version' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--gallery-name' 'SIG_testCluster2' '--gallery-image-definition' 'kamino-k8s-agentpool1-33778956-vmss-prototype'] +2021-04-01T21:44:56.620522256Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-33778956-vmss'] +2021-04-01T21:44:57.797330529Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'update' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-33778956-vmss' '--set' 'virtualMachineProfile.storageProfile.imageReference.id=/subscriptions/00000000-0000-0000-0000-000000000001/resourceGroups/testCluster2/providers/Microsoft.Compute/galleries/SIG_testCluster2/images/kamino-k8s-agentpool1-33778956-vmss-prototype' 'virtualMachineProfile.storageProfile.imageReference.sku=null' 'virtualMachineProfile.storageProfile.imageReference.offer=null' 'virtualMachineProfile.storageProfile.imageReference.publisher=null' 'virtualMachineProfile.storageProfile.imageReference.version=null' 'virtualMachineProfile.osProfile.customData=I2Nsb3VkLWNvbmZpZwo='] +2021-04-01T21:45:17.289477356Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool1-33778956-vmss'] +2021-04-01T21:45:18.357102717Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--vmss-name' 'k8s-agentpool1-33778956-vmss'] +2021-04-01T21:45:19.556287420Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'delete' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--vmss-name' 'k8s-agentpool1-33778956-vmss' '--name' 'vmssCSE'] +2021-04-01T21:45:33.188882506Z k8s-agentpool1-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'delete' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--vmss-name' 'k8s-agentpool1-33778956-vmss' '--name' 'CustomScriptForLinux'] +2021-04-01T21:45:41.338982221Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'show' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--name' 'k8s-agentpool2-33778956-vmss'] +2021-04-01T21:45:42.398567018Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'list' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--vmss-name' 'k8s-agentpool2-33778956-vmss'] +2021-04-01T21:45:43.377803813Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'delete' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--vmss-name' 'k8s-agentpool2-33778956-vmss' '--name' 'vmssCSE'] +2021-04-01T21:46:50.734914752Z k8s-agentpool2-33778956-vmss INFO: ===> Executing command: ['az' 'vmss' 'extension' 'delete' '--subscription' '00000000-0000-0000-0000-000000000001' '--resource-group' 'testCluster2' '--vmss-name' 'k8s-agentpool2-33778956-vmss' '--name' 'CustomScriptForLinux'] +2021-04-01T21:49:16.569691278Z auto-update INFO: ===> Executing command: ['/usr/bin/vmss-prototype' 'status' '--resource-group' 'testCluster2' '--subscription' '00000000-0000-0000-0000-000000000001'] +2021-04-01T21:50:14.808741788Z VMSS Prototype Status for cluster: +2021-04-01T21:50:14.808799390Z testCluster2: k8s-agentpool1-33778956-vmss: VMSS Prototype Image Version 2021.04.01 - Succeeded - BuiltFrom: k8s-agentpool1-33778956-vmss000001 @ 2021-04-01 21:32:35.083641 +2021-04-01T21:50:14.808806090Z testCluster2: k8s-agentpool1-33778956-vmss: Configured to use latest VMSS Prototype Image Definition +2021-04-01T21:50:14.808810590Z testCluster2: k8s-agentpool2-33778956-vmss: VMSS Prototype Image Version 2021.04.01 - Succeeded - BuiltFrom: k8s-agentpool2-33778956-vmss000000 @ 2021-04-01 21:31:23.482360 +2021-04-01T21:50:14.808815490Z testCluster2: k8s-agentpool2-33778956-vmss: Configured to use latest VMSS Prototype Image Definition ``` \ No newline at end of file diff --git a/vmss-prototype/Dockerfile b/vmss-prototype/Dockerfile index 699736f..fe6e4a1 100644 --- a/vmss-prototype/Dockerfile +++ b/vmss-prototype/Dockerfile @@ -4,8 +4,10 @@ # docker build . -t vmss-prototype:testing # See smoketest.sh for examples of how I run locally +# Comes in at ~73MB FROM ubuntu:20.04 +# Currently adds ~66MB due to patches/updates and our requirements RUN apt-get update && \ apt-get upgrade --yes && \ apt-get install --no-install-suggests --no-install-recommends --yes \ @@ -20,19 +22,19 @@ RUN apt-get update && \ update-alternatives --install /usr/bin/python python /usr/bin/python3 4 && \ python -m pip install --upgrade pip && \ python -m pip install \ - argcomplete \ argparse \ && \ rm -rf /var/lib/apt/lists/* /root/.cache -# Azure CLI -RUN curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \ +# Azure CLI (Currently ~789MB all by itself!) +RUN echo "Install AzureCLI" && \ + curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=true apt-key add - && \ echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $(lsb_release -sc) main" > /etc/apt/sources.list.d/azure-cli.list && \ apt-get -qq update && \ - apt-get -qq install --yes --no-install-suggests --no-install-recommends \ - azure-cli && \ + apt-get -qq install --yes --no-install-suggests --no-install-recommends azure-cli && \ rm -rf /var/lib/apt/lists/* /root/.cache +# A tiny ~75KB COPY vmss-prototype /usr/bin/ -ENTRYPOINT [ "/usr/bin/vmss-prototype" ] \ No newline at end of file +ENTRYPOINT [ "/usr/bin/vmss-prototype" ] diff --git a/vmss-prototype/smoketest2.sh b/vmss-prototype/smoketest2.sh index 05a7fbe..9e26c1a 100755 --- a/vmss-prototype/smoketest2.sh +++ b/vmss-prototype/smoketest2.sh @@ -4,6 +4,10 @@ MY_ACR=skyman.azurecr.io MY_REPOSITORY=scratch/kamino/vmss-prototype MY_TAG=experimental.msinz +# The namespace we want to deploy to +NAMESPACE=default +DEPLOYMENT_NAME=smoke-test2 + az acr login -n ${MY_ACR} IMAGE_TAG=${MY_ACR}/${MY_REPOSITORY}:${MY_TAG} @@ -13,12 +17,13 @@ docker build . -t ${IMAGE_TAG} docker run --rm -i -t ${IMAGE_TAG} --help docker push ${IMAGE_TAG} +docker history ${IMAGE_TAG} # The helm binary - I use helm3 as the name as we had to support both in the past HELM3=helm3 # Get rid of any prior version (just in case) -${HELM3} delete smoke-test 2>/dev/null >/dev/null || true +${HELM3} delete ${DEPLOYMENT_NAME} 2>/dev/null >/dev/null || true # This is my smoke-test. I force the gracePeriod to be short # and that we will move forward even if drain fails just because @@ -30,9 +35,11 @@ ${HELM3} delete smoke-test 2>/dev/null >/dev/null || true # my test cluster. If it is not included, this runs as a status # job. If a target node is included, it runs as an actual image # creation job. -${HELM3} upgrade --install smoke-test ../helm/vmss-prototype \ - --namespace default \ - --set kamino.name=kamino-smoketest \ +${HELM3} upgrade --install ${DEPLOYMENT_NAME} ../helm/vmss-prototype \ + --namespace ${NAMESPACE} \ + --set kamino.labels.app=${DEPLOYMENT_NAME} \ + --set kamino.logLevel=DEBUG \ + --set kamino.name=kamino-${DEPLOYMENT_NAME} \ --set kamino.container.imageRegistry=${MY_ACR} \ --set kamino.container.imageRepository=${MY_REPOSITORY} \ --set kamino.container.imageTag=${MY_TAG} \ @@ -42,12 +49,18 @@ ${HELM3} upgrade --install smoke-test ../helm/vmss-prototype \ --set kamino.drain.force=true \ #--set kamino.targetNode=k8s-agentpool1-18861755-vmss000007 -kubectl get jobs -lapp=kamino-vmss-prototype +# Show the commands we are about to run +set -x + +# Show the job... +kubectl get jobs --namespace ${NAMESPACE} --selector app=${DEPLOYMENT_NAME} + +# Wait for the job to be ready +kubectl wait --timeout 90s --for condition=Ready pods --namespace ${NAMESPACE} --selector app=${DEPLOYMENT_NAME} + +# Note that I do this here knowing that it will never exit and that +# I am just watching it start/etc. That was the whole point. +kubectl get pods --namespace ${NAMESPACE} --selector app=${DEPLOYMENT_NAME} --output wide -# We background start the watch on the pod and then wait for -# the job to complete and then get the logs -kubectl get pods -o wide -lapp=kamino-vmss-prototype -w & -pod_watcher=$? -kubectl wait jobs --for condition=Complete -lapp=kamino-vmss-prototype --timeout 600s -kubectl logs -lapp=kamino-vmss-prototype --tail 9999 --timestamps --follow -kill ${pod_watcher} +# Now show the logs (with --follow which will run until the job completes) +kubectl logs --timestamps --namespace ${NAMESPACE} --selector app=${DEPLOYMENT_NAME} --follow --tail 1000 diff --git a/vmss-prototype/vmss-prototype b/vmss-prototype/vmss-prototype index e13d7d2..8d2b9a7 100755 --- a/vmss-prototype/vmss-prototype +++ b/vmss-prototype/vmss-prototype @@ -1,6 +1,4 @@ #!/usr/bin/env python -# PYTHON_ARGCOMPLETE_OK -# The token above is necessary for Python bash auto-complete trigger # pylint: disable=line-too-long import argparse @@ -14,9 +12,6 @@ import sys import threading import time -# This is what hooks to the Python bash autocomplete -import argcomplete # type: ignore - def fatal_error(msg): """ @@ -386,14 +381,51 @@ def image_tweaks(node_name): # process actually causes the machine-id to be regenerated. We have to # force a poweroff after writing the file such that no service rewrites the # file as it is being shut down. + # We also need to clean out some files for cloud-init (azure) such that it + # does not think it was a prior instance. # However, this happens so quickly that the master node may not have gotten # the details as to the fact that the pod even started to run, so we have to # do a small sleep before doing the dirty deed. # We then notice that this is running and assume we can then, after a few # seconds, delete the pod such that kubernetes does not think about it again. - # We don't actually need to do that if the node is deleted as that will delete - # the pod (since there is no controller to create another) - # However, it is nice to clean up after onself. + # We also delete some of the cloud init data such that the node does not + # think it "existed" before - this removes the problem with IDNS/DDNS where + # the node sends a signal that it "used to be" another node and to undo that + # DNS entry. + # Finally, we also append to /var/log/ancestry.log such that we can, in each + # node, know the genetic ancestry of the node. (Actually do that first) + + # To make it easier to read, the commands are here with ';' separation + # NOTE: do not use single quotes within these strings + cleanup_commands = ';'.join([ + # Give kubernetes a few moments to notice we are running + # as the rest of this stuff really just happens quickly + '/bin/sleep 4', + '/bin/echo "$(/bin/date) VMSS-Prototype Donor: $(/bin/hostname)" >> /var/log/ancestry.log', + # Multiple lines so it is easier to read all of the different + # items we are cleaning up (removing) + '/bin/rm -rf' + # Our host name is not really ours in a prototype - clear it out + ' /etc/hostname' + # All of this cloud init data needs to be fresh too + ' /var/lib/cloud/data/*' + ' /var/lib/cloud/instance' + # Finally, this is just cleanup to prevent build up + # of garbage that can slow things down + ' /var/lib/cloud/instances/*' + ' /var/lib/waagent/history/*' + ' /var/lib/waagent/events/*' + # or have goals that are no longer valid + ' /var/lib/waagent/ExtensionsConfig.*.xml' + ' /var/lib/waagent/GoalState.*.xml' + ' /var/lib/waagent/*.manifest.xml' + '', + # Update an ancestry.log + # This forces the machine-id to be re-issued + '/bin/cp /dev/null /etc/machine-id', + # Finally, we need to power off now + '/bin/systemctl poweroff --force --message=VMSS-Prototype' + ]) # This is a pod yaml for our image tweak we will do on the target node # that will be our new prototype image. @@ -410,7 +442,7 @@ def image_tweaks(node_name): " - '--'\n" " - /bin/sh\n" " - -xc\n" - " - '/bin/sleep 2; /bin/cp /dev/null /etc/machine-id; /bin/systemctl poweroff --force --message=VMSS-Prototype'\n" + " - '{4}'\n" " image: '{2}'\n" " imagePullPolicy: IfNotPresent\n" " name: tweaks\n" @@ -426,7 +458,7 @@ def image_tweaks(node_name): " operator: Exists\n" " - effect: NoExecute\n" " operator: Exists\n" - ).format(pod_name, namespace, image, node_name) + ).format(pod_name, namespace, image, node_name, cleanup_commands) run(['kubectl', 'delete', '-f', '-'], stdin=yaml_text, @@ -449,10 +481,10 @@ def image_tweaks(node_name): # The pod will run rather quickly and then the node will shutdown. # We can't depend on being able to talk to the node but we can get the # phase of the pod. The retries are just to cover the time to pull - # the container. 60 here means we at most wait 5 minutes for the pull + # the container. 40 here means we at most wait 5 minutes for the pull # and run state - just in case. Rarely see this hit more than a small # number of iterations. - retries = 60 + retries = 40 phase = 'unknown' while phase not in ['Running', 'Succeeded', 'Failed'] and retries > 0: phase, _, _ = run(['kubectl', 'get', 'pod', @@ -462,9 +494,9 @@ def image_tweaks(node_name): ], check=False) # This should be at least twice the value used above such that # we can see that the pod has started and have waited a small bit - # before deleting it. We sleep 2 seconds above, so this 5 seconds + # before deleting it. We sleep 4 seconds above, so this 8 seconds # covers it. - time.sleep(5) + time.sleep(8) retries = retries - 1 run(['kubectl', 'delete', 'pods', @@ -869,31 +901,39 @@ def vmss_prototype_update(sub_args): ], retries=6) snapshot = json.loads(output) - # Delete the instance to prevent idns side-effects from future VMs coming online based on its prototype - # See https://github.com/jackfrancis/kamino/issues/26 - stdout, stderr, exit_code = run(az(['vmss', 'delete-instances'], subscription) + [ + # NOTE: In certain versions of aks-engine, restarting a VM after + # it was deallocated will cause cloud-init to write an empty + # azure.json file but the aks-engine CSE will not write the + # correct one as it thinks the node is all ready to run. + # For those cases, this start will leave the node un-responsive + # and it would need a "reimage" (or even better, reimage after + # the prototype image was created and set up successfully) + # Even in those specific aks-engine versions, it only causes + # a problem when the VM instance has not yet been a prototype + # based instance. Once it is a prototype based instance, this + # no longer is a problem. So we can just ignore this here as + # this only impacts certain aks-engine versions and only for the + # nodes that had not yet transitioned to a vmss-prototype + # All aks-engine versions 0.57.0 or later have this fixed. + stdout, stderr, exit_code = run(az(['vmss', 'start'], subscription) + [ '--resource-group', resource_group, '--name', vmss, '--instance-ids', instance_id, '--no-wait' ], retries=3, check=False, retry_func=not_found_no_retry) - if exit_code == 0: - nodeDeleted = True finally: - if not nodeDeleted: - # Let it be a productive member of the cluster again - # We ignore errors here since the VM may no longer exist - # We best-effort uncordon it. - run(['kubectl', 'uncordon', - target_node - ], retries=3, check=False, retry_func=not_found_no_retry) + # Let it be a productive member of the cluster again + # We best-effort uncordon it. + run(['kubectl', 'uncordon', + target_node + ], retries=3, check=False, retry_func=not_found_no_retry) - # Allow the cluster from scaling away the node... - run(['kubectl', 'annotate', 'node', - target_node, - 'cluster-autoscaler.kubernetes.io/scale-down-disabled-' - ], retries=3, check=False, retry_func=not_found_no_retry) + # Allow the cluster from scaling away the node if it wishes + run(['kubectl', 'annotate', 'node', + target_node, + 'cluster-autoscaler.kubernetes.io/scale-down-disabled-' + ], retries=3, check=False, retry_func=not_found_no_retry) # Build the image version from the snapshow we have output = vmss_build_sig_image(subscription, resource_group, sig_name, image_definition, version, snapshot) @@ -1613,9 +1653,6 @@ def main(): set_better_help(subparser) - # Enable command line auto-complete support - argcomplete.autocomplete(parser) - # Parse the arguments cmd_line_args = parser.parse_args()