v0.4.0
v0.4.0, The stability focused release
Welcome to v0.4 / v1alpha4. This release is a major step forward in our API and code stability. There are a number of breaking changes in this release and a large list of bug fixes.
🔦 Highlights
- Our test matrix, for both end-to-end and integration, has been greatly extended in this release.
- KubeadmControlPlane now supports automatic remediation (when setup with MachineHealthCheck), extended mutable spec fields, and the ability to customize the rollout strategy.
- MachineHealthCheck is now more flexible, with new capabilities such as external remediation and skipping remediation on paused machines.
- Externally managed infrastructure: infrastructure provider can give the ability to external systems to reconcile their own InfraCluster objects (e.g. AzureCluster, AWSCluster, etc).
- Clusterctl has been improved, with a new
generate
command to replaceconfig
, an option to view optional template variables and their defaults, and a newalpha rollout
command.
Upgrading from v0.3
Users
To upgrade from a running management cluster based on v0.3.x and v1alpha3 APIs, follow these instruction in the book.
Providers
For all providers or developers implementing Cluster API, please follow the dedicated instructions in the book.
📝 Proposals
- Cluster API Provider Operator (#3833)
- Insulate users from kubeadm API version changes (#4170)
- Add support for infrastructure cluster resources to be managed externally (#4135)
- Windows Support (#3616)
- Replace ExternalRemdiationTemplate with RemediationTemplate in MHC proposal (#4528)
- Update KCP proposal disambiguating healthcheck (#4093)
- Update KCP proposal with scale in (#3857)
- Update spot instances proposal with interruptible label setting (#3817)
⚠️ Breaking Changes
- Update Go to
1.16
- Update Kind to
v0.11.x
(#3815) - Upgrade cert-manager to
v1.1.0
(#4013) - The management cluster minimum Kubernetes version is
v1.19.1
(#3746) - Use separate service accounts for each manager (#4245)
- Remove
kube-rbac-proxy
and expose metrics onlocalhost:8080
(#4640) - MachineDeployment
MaxUnavailable
andMaxSurge
values: drop usage of deprecatedGetValueFromIntOrPercent
in favour ofGetScaledValueFromIntOrPercent
(#4532)
small impact: Only values allowed are of type Int or Strings with an integer and percentage symbol e.g5%
. - MachineHealthCheck: node startup timeout now requires control plane initialized and cluster infrastructure readiness before starting the countdown (#3752)
- MachineDeployment, MachineSet, and MachinePool Machine's template metadata now only exposes labels and annotations (#4363)
impact: The metadata section for templated objects (for example MachineDeployment'sspec.template.metadata
) was exposing non-functional fields like name, generateName, namespace, or ownerReferences; these fields weren't used anywhere in the codebase and have been removed. - Align flag names with upstream Kubernetes components (#3934)
impact:--metrics-addr
==>--metrics-bind-addr
--leader-election ==> --leader-elect
- Run mutating, validating, and conversion webhooks with managers (#3985)
impact: Previously webhooks were running in a different namespace from the manager (capi-webhook-system
). To simplify our deployment model, our published images and binaries now run the webhook server by default.
Clusterctl
- Add
--raw
flag for clusterctl generate provider subcommand (#4768) - Use native zsh completion (#4113)
- When the build is on v1alpha4 should not upgrade to v1alpha3 contract (#4202)
- The infrastructure provider DigitaloOcean was renamed to digitalocean (previously do) (#3809)
- Deprecate
clusterctl config
in favor ofclusterctl generate
(#4584)
Kubeadm Control Plane
- Move
spec.nodeDrainTimeout
tospec.machineTemplate.nodeDrainTimeout
(#4815) - Rename
spec.upgradeAfter
tospec.rolloutAfter
(#4535) - Stop updating and using Kubeadm's
ClusterStatus
with Kubernetes v1.22 (#4643) - Support metadata for machines under
spec.machineTemplate.metadata
and propagate to all templated resources (#4644)
Kubeadm Bootstrapper
- Remove deprecated Machine
spec.bootstrap.data
(#4000) - Default Kubelet cgroupDriver to systemd for Kubernetes >= 1.21 (#4236)
- Generate kubeadm config files under
/run/kubeadm
instead of/tmp
(#3776)
Experimental Features
- Change
MachinePool
experiment API group to cluster.x-k8s.io (#4574) - Remove unused MachinePool
spec.strategy
(#3990)
⚠️ 👩💻 Breaking Changes for developers
- Update Controller Runtime to
v0.9.x
(#4752) - Introduce the
sigs.k8s.io/cluster-api/test
Go Module (#4713)
moderate impact: importing the test e2e framework or the docker infrastructure provider (CAPD) now requires a new go module dependency and areplace
directive for Cluster API in yourgo.mod
. - Move envtest setup under internal/envtest (#4698)
small impact: the package is not functional if used outside of the Cluster API repository. - Remove
helpers.NewFakeClientWithScheme
(#4690)
small impact: The function was previously used with the Controller Runtime fake client which did not initialize objects' ResourceVersion, this has been fixed upstream and now removed. - Accept options in
remote.NewClusterCacheTracker
(#4693)
small impact: This function was accepting positional arguments which have now been replaced by an option-based struct. - Remove
RequeueAfterError
(#3929) - Unexport MachineHealthCheck
patchUnhealthyTargets
method (#4579) - Remove Kubeadm DNS type field from types (#4547, #4516)
impact: This field was always defaulted tocoreDNS
and was immutable. - Remove ClusterConfiguration.UseHyperKubeImage from v1alpha4 (#4545)
- Clean up deprecated variables/functions in v1alpha4 (#4078)
- Remove the example provider (#3992)
- Move version package from
cmd/version
toversion
(#4070) - Add GVK object validation to patch helper (#4212)
minimal impact: This change adds extra validation to the helper library, a patch helper created with a specific GVK can only be used with that GVK throughout its lifetime. - Set user agent and timeout for remote cluster client (#4060)
- Remove deprecated
DeleteNodeAnnotation
annotation (#3955) - Add required coordination/leases RBAC for new default Controller Runtime manager leader election method (#3756)
small impact: RBAC permissions have been updated for all manager. - Add sentinel file to signal successful bootstrapping (#4084)
Test Framework
- Add MachinePool to log collector (#4575)
- Remove deprecated functions (#3741, #3742)
- Resolve
CNI_RESOURCES
without using env vars (#3896) - Wait for all the machine to exist again after remediation (#4415)
- Add result parameter to ApplyClusterTemplateAndWait (#4125)
- Use Kind's default network in CAPD (#4002)
Clusterctl library
- Make clusterctl support for cert-manager more flexible (#4748)
- Add
--raw
flag for clusterctl generate provider subcommand (#4768) - Adapt clusterctl to webhook deployed with managers (#4297)
- Rename clusterctl
client/inventory.GetDefaultProvider<>
toGetProvider<>
(#4696) - Remove clusterctl
--watching-namespace
(#4666) - Remove clusterctl management groups (#4668)
- Remove clusterctl delete
--namespace
flag (#4674) - Block execution when used with v1alpha3 management clusters (#4199)
- Deprecate
Provider.WatchedNamespace
(#4694) - Remove embedded metadata from clusterctl (#4033)
- Remove hard code manifest version and hash (#3918)
✨ New Features
- Add Cluster API Provider Nested (#4792, #4793)
- Add Cluster API Provider GCP (#4001)
- CAPD: Add ipv6 support (#4558)
- Add watch label to allow multiple manager instances (#4119)
- Add externally managed annotation and predicate (#4303)
- Clusterctl: Show required and defaults in
clusterctl generate cluster <name> --list-variables
(#4645) - Clusterctl:
alpha rollout pause/resume/undo/restart
for MachineDeployments (#4054, #4098, #3838) - KubeadmControlPlane: Make NTP settings mutable in webhook validations (#4798)
- KubeadmControlPlane: Add rollout strategy support for KCP (#4073)
- KubeadmControlPlane: Make KCP Spec mutable (#3994)
- KubeadmControlPlane: KCP remediation (#3956)
- KubeadmControlPlane: Mark specific KCP machines with delete annotation for scaling down (#3948)
- MachineDeployment: Support deletePolicy (#3773)
- MachineDeployment: Add support for
OnDelete
rollout strategy (#4346) - MachineDeployment: Add annotation support to
util.CloneTemplate
to pass them down from templates to machines (#4568) - MachineHealthCheck: Allow users to disable NodeStartupTimeout (#4471)
- MachineHealthCheck: Add support to skip Machine remediation, and respect paused Machines (#4168)
- MachineHealthCheck: Add
remediationsAllowed
field to status (#3884) - MachineHealthCheck: Support external remediation (#3882)
- MachineHealthCheck: Adds machine health check conditions to Machine Ready condition (#3705)
- MachineHealthCheck: Support range of values for unhealthy machines in machine health check spec (#4128)
- Label interruptible nodes (#3668)
🐛 Bug Fixes
- Cluster: Include MachinePool objects in descendant count when deleting a Cluster (#4295)
- Machine:
status.phase
should beprovisioned
when there is a ProviderID and no Node yet (#4787) - Machine: Add ability for the nodeToMachines mapper to filter by Cluster's name and namespace (#4513)
- Machine: Node deletion should check the cause of the error (#3966)
- MachineDeployment: Normalize version validation (#4670)
- MachineDeployment: Check Strategy is not nil to avoid panic (#4511)
- MachineSet: Add check for empty maps (#4171)
- MachineSet: Include Machines in deleting state when calculating replicas (#3434)
- KubeadmControlPlane: Allow remediation when the etcd member being remediated is missing (#4561)
- KubeadmControlPlane: Fix CoreDNS upgrade from
v1.20
tov1.21
(#4476) - KubeadmControlPlane: Register KubeadmControlPlane's scale subresource to validate scaling (#4366)
- KubeadmControlPlane: upgrades should use the list of Machines for scaling decision (#4376)
- KubeadmControlPlane: Fix the observedGeneration update (#4393)
- KubeadmControlPlane: Wait for MachinePools to be deleted before deleting control plane Machines (#4646)
- KubeadmControlPlane: Fix nil pointer dereference in webhook (#4175)
- KubeadmControlPlane: adopt v1alpha2 kubeconfig secrets (#4034)
- KubeadmControlPlane: Add a new condition
MachinesCreatedCondition
to indicate when template cloning fails (#3799) - KubeadmBootstrap: prevent duplicated files in cloudinit for cloudinit.NewNode (#4630)
- KubeadmBootstrap: Fix
kube-proxy
to account for Linux security update (#4717) - KubeadmBootstrap: Set
error_exit
code arg in kubeadm bootstrap script (#4079) - KubeadmBootstrap: should not log the token upon renewal (#3774)
- KubeadmControlPlane: Fix remediation when node Name and etcd member Name is not the same as the Machine's Name (#4240)
- KubeadmControlPlane: Generate etcd client if at least one member is healthy (#3946)
- KubeadmControlPlane: reconcileEtcdMembers should use its own NodeRefs (#3964)
- KubeadmControlPlane: Scale down checks excludes machines about to be deleted (#3977)
- KubeadmControlPlane: Relax update validation to allow rotating ssh keys (#3928)
- KubeadmControlPlane: use a live client when listing machines (#3759)
- KubeadmControlPlane: Prevent
reconcileEtcdMember
to remove etcd members when etcd starts slowly (#3962) - MachineHealthCheck: Pass the Cluster into getTargets (#4367)
- MachineHealthCheck: sort status targets to avoid changing status (#3999)
- ClusterResourceSet: don't use a predicate on secret Type with metadata objects (#4723)
- ClusterResourceSet: Fix not getting Secret/Configmap TypeMeta information (#4129)
- DockerProvider: do not reconcile machine if cluster or machine is paused (#4453)
- DockerProvider: Fix MachinePool status update (#4208)
- DockerProvider: Handle stopped containers (#4071)
- Clusterctl: Allow generate cluster
--from
on empty clusters (#4553) - Fix how we set annotations in conversion webhooks (#4688)
- Fix annotations.AddAnnotations (if annotations has been nil before) (#4373)
- Fix ObjectTracker to allow retry on error (#4186)
- Avoid reporting health check error when the cache is stopped (#4067)
- Use nonroot user with id in our container builds (#4064)
💚 Testing
- Upgrade 1.18/1.19 kindest/node images to latest patch version (#4663)
- Add validation testing for defaulting (#4448)
- Invoke ginkgo in kubetest through entrypoint (#4662)
- Fix clusterproxy interface in E2E (#4371)
- Avoid masking possible errors in flaky CRS test (#4368)
- Skip E2E Conformance tests labeled as Serial (#4083)
- Change CI bucket for conformance tests to the non-bazel build (#4250)
- Use a different object name for each ClusterResultSet test case (#4081)
- Test environments should wait for manager before running tests (#4086)
- Support 'file' scheme in component source (#4307)
- Increase timeout in ClusterResourceSet controller unit tests (#4076)
- Conformance tests now use kind network (#4042)
- Addresses workload coredns test flakes (#3870)
- The util/patch tests shouldn't use a non-existent namespace (#3757)
- Assertion of Machine condition should be in an Eventually (#4692)
- Allow E2E tests to install more than one infra provider (#4791)
- Make node drain delete timeout configurable in E2E framework (#4830)
- Decouple control plane status check logic (#4719)
- Fix E2E tests using a single context object (#4706)
- Add unit tests to NewJoinControlPlane (#4710)
- Add E2E for scale in rollout (#4347)
- Fix E2E log statements printing pointers (#4701)
- Allow choosing the node image for the bootstrap cluster in framework (#3750)
- Add reverse conversion fuzz test (#3877)
- E2E test to upgrade workload cluster (#4130)
- E2E test: Allow extra args to be passed during kubectl apply (#4354)
- Collect bootstrap cluster logs in E2E tests (#4038)
- Add MachineDeployment scale test (#4647)
- Pin cgroupDriver to cgroupfs (#4614)
- Add E2E tests for workload cluster with Kubernetes from ci/latest (#3916)
- kubernetesversions: Add option to inject CI artifacts into a KubeadmConfig for a MachinePool (#3730)
- Add unit test coverage for machineDeployments reconcileOldMachineSets, reconcileNewMachineSet (#4498, #4495)
- Gather more Docker and containerd data/logs in ci-e2e.sh (#4414)
- Add Kubernetes conformance E2E test (#3811)
- Add E2E test for node drain timeout feature (#3818)
- Divide KCPUpgrade E2E test suite (#4683)
- Refactor tests to plain go in controllers (#4615)
- Refactor tests to plain go in controlplane/kubeadm/controllers (#4616)
- Refactor tests to plain go in controllers/remote (#4612)
- Refactor tests to plain go in exp/addons/controllers (#4602)
- Refactor tests to plain go in bootstrap/kubeadm/controllers (#4603)
- Refactor tests to plain go in util/patch (#4599)
- Refactor tests to plain go in exp/controllers (#4601)
- Refactor tests to plain go in util/collections (#4597)
- Implement E2E test for clusterctl upgrade (#3708)
- Fallback on build if kindest images are missing (#4397)
- Ease local CAPD E2E test execution (#4517)
- Fix upgrade test (CoreDNS verification) (#4470)
- Removing ginkgo from Kubeadm Controller tests (#4451)
- Add test for listDescendents fetch MachinePools (#4161)
🌱 Others
- Refactor golangci-lint config to remove false negatives (#4657)
- Use new setup-envtest binary to setup envtest (#4844)
- Remove lint exclude for file and directory permissions (#4831)
- Kubernetes 1.22 support - bump etcd client v3.5.0 (#4769)
- Rename ListVariableOnly flag in TemplateInput (#4785)
- Provide more information in clusterctl move logs (#4818)
- Refactor multiline yaml const (#4809)
- Use docker API instead of CLI in test framework (#4499)
- Reuse
hasMatchingLabels
in MachineSet controller (#4652) - Enable golanci-lint for test/ submodule (#4780)
- Add build arg to override builder image (#4771)
- Align v1beta3 types to latest changes in Kubernetes (#4751)
- Move registering field indexes to the
noderefutil
package (#4722) - Use ClientUncachedObjects name consistently (#4758)
- Set UserAgent field in all controller managers (#4750)
- Kubetest: Remove usage of Viper, converting config files to kubetest arguments (#4761)
- Introduce const for ClusterctlCoreLabel inventory value (#4726)
- Introduce const for CertManagerImageComponent value (#4725)
- Drop 420 dec default for secrets (#4724)
- Adapt clusterctl move to requirements for the new multi-tenancy model (#4628)
- Fix Tilt live reload for CAPD provider (#4718)
- Enable exportloopref, ifshort, and nilerr linters (#4649)
- Improve clusterctl generate cluster --list-variables (#4708)
- Clusterctl move should consider Secrets from provider's namespace (#4598)
- Introduce Kubeadm v1beta3 (#4691)
- Follow-up on the removal of clusterctl watching namespace (#4695)
- Use client-go global scheme as much as possible (#4689)
- CAPD: add support for btrfs and zfs (#4648)
- Remove KCP internal/hash unused package (#4637)
- Cleanup unused variable in bootstrap/kubeadm/api/v1alpha4/kubeadmconfig_webhook.go (#4606)
- Fix comment issue in controlplane/kubeadm/internal/etcd/etcd.go (#4607)
- Remove redundant judgment condition in controlplane/kubeadm/internal/workload_cluster_coredns_test.go (#4604)
- KCP should use some backoff when updating kubeadm configmap (#4613)
- Make KCP using embedded kubeadm types while manipulating the kubeadm-config ConfigMap (#4443)
- Update KCP remediation docs and messages to support > 1 replicas (#4594)
- Improve KCP conditions when node status in unknown (#4564)
- Refactor MachineSet scaling code and add more tests (#4506)
- Add a fuzzer function for *metav1.Time (#4544)
- Fix comment issue in method
MachineSetToDeployments
(#4540) - Fix comment for UpdateImageRepositoryInKubeadmConfigMap (#4523)
- Rename unit test to TestCalculateStatus (#4505)
- Fix redundant judgment for reterr==nil (#4494)
- Fixes typo in machine types description (#4487)
- Include deamonsets in list images in clusterctl (#4455)
- CAPD: print debug infos on container creation error (#4432)
- Upgrade cloudbuild to use gcb-docker-gcloud with go 1.16 (#4431)
- KCP: include node list error in KCP and control plane machine conditions (#4421)
- CAPD: Ensure Loadbalancer IP is not empty (#4398)
- Remove deprecated ioutil usage (#4399)
- Minor CAPD Makefile cleanup (#4418)
- Remove CAPD hack/tools directory and run ensure-kustomize script (#4412)
- Include 'edited' to the list of PR types for golangci action (#4391)
- Add github action to run golangci-lint (#4374)
- Reorganize Make generate targets (#4362)
- Run golanci-lint in parallel and enable more linters (#4360)
- Remove most of bindata targets in favor of Go 1.16 embed (#4350)
- Refactor: standardize machine filter functions and improve testing (#4207)
- Sets InfraReady condition for MachinePool for incorrect external reference (#4335)
- Normalizing flags from _ to - (#4332)
- Use variables in Makefile consistently (#4323)
- Use distroless for CAPD (#4298)
- Make RegisterClusterResourceSetConfigMapTransformation to be general (not only for CNI) (#4302)
- Fix the typo in error message (#4300)
- Stop gap for kubeadm types removal (#4227)
- Enable ClusterResourceSet by default (#4213)
- Update Cert Manager Tilt module to v1.1.0 as default (#4274)
- Updates conversion error message with reasoning (#4267)
- Update golangci-lint config to use disable-all/enable (#4249)
- Clusterctl v1alpha4 should not install v1alpha3 providers (#4200)
- Remove verbose flag from make test (#4187)
- Switch published artifacts to k8s.gcr.io (#4214)
- Restore Fuzzer tests (#4211)
- Align kubeadm types (#4204)
- Add possibility to specify webhook cert dir (#4148)
- Refactor failure domains logic out of controlplane internal package (#4160)
- Use openAPI scheme for defaulting replicas (#4164)
- Include metadata for the CAPI releases (#4167)
- Make clusterctl completion zsh work with sourcing (#4169)
- Clean up kube version parsing (#4163)
- Remove verbose log line from CRS controller (#4149)
- Release daily builds and manifests to staging bucket (#4101)
- Migrate gcr.io/kubernetes-ci-images to gcr.io/k8s-staging-ci-images (#4107)
- Use debian-based container for CAPD (#4140)
- Switch KCP to use ClusterCacheTracker.GetClient for the workload cluster (#4104)
- Remove the image placeholder in release notes (#4108)
- Add E2E test for KCP remediation (#4094)
- Add optional parameter for using repolist to the kubetest library (#4069)
- Move MachineSet patching to a defer call (#3904)
- Add clusterctl describe cluster command (#3942)
- Increase leader election lease values for KCP (#3980)
- Annotate nodes with cluster info (#4048)
- DockerMachinePool should not use Status.instances from previous reconciliation (#3921)
- Collect pods logs from workload clusters in E2E tests (#4017)
- Remove deprecated fake.NewFakeClient and fake.NewFakeClientWithScheme (#3995)
- Update util functions for secret generation (#3979)
- Use 1.1 experimental dockerfile image and cache go/pkg/mod (#3972)
- Add v1alpha3 test templates alongside v1alpha4 ones (#3950)
- Add embedded metadata for v1alpha4 in clusterctl (#3949)
- Add KCP conditions, split reconcileHealth into preflight and reconcileEtcdMembers, make both use conditions (#3922)
- Fix small typo in clusterctl command docs (#3945)
- Avoid MachineHealtchCheck to return early on patch errors (#3901)
- Fix typo in clusterctl client test (#3888)
- Refactor controlplane health check in KCP (#3883)
- Add Node related condition to Machine conditions (#3890)
- Print provider type and name to match config file naming (#3871)
- Avoid draining when KCP object is being deleted (#3864)
- Upgrade corefile migration to v1.0.11 (#3853)
- Include docker output in error when we receive a non-zero exit code (#3823)
- Add Node watch to Machine controller (#3826)
- Remove RequeueAfterError from machine controller (#3723)
- CAPD webhooks should use 9443 as port (#3758)
- Use util.ManagerDelegatingClientFunc in all managers (#3755)
- Retrieve kustomize release binary instead of building it (#3744)
- Add healthprobe for bootstrap and controlplane (#4460)
- Extra validations for v1alpha3 -> v1alpha4 upgrade (#4230)
- Use uncached client and partial metadata for secret and configmaps (#4023)
- Add GetLiveClient() to ClusterCacheTracker (#3899)
- Forward-port modifies DockerMachine condition status to report for control plane to be ready (#3869)
- Use tilt cert_manager extension (#3775)
- Remove CAPD's machine deletion (#3592)
- Fix conditions counter in case of mismatching order of conditions between WithConditions and WithStepCounterIfOnly (#3740)
- Book pre-requisites should all use realpath (#4157)
- Update
clusterctl init --help
command (#3788)
📖 Additionally, there have been 77 contributions to our documentation and book. (#4813, #4797, #4781, #4789, #4782, #4788, #4778, #4772, #4773, #4765, #4682, #4704, #4686, #4664, #4626, #4661, #4627, #4600, #4592, #4219, #4567, #4500, #4550, #4551, #4530, #4504, #4135, #4501, #4496, #4492, #4480, #4327, #4427, #4333, #4459, #4328, #4402, #4345, #4385, #4285, #4340, #4348, #4306, #4301, #4170, #4237, #4220, #4228, #4215, #4188, #4189, #4153, #4121, #4127, #4092, #4100, #3833, #4074, #4051, #4003, #4018, #4021, #3976, #3960, #3953, #3943, #3932, #3905, #3754, #3827, #3814, #3820, #3616, #3777, #3724, #3736, #3737)
Thanks to all our contributors! 😊