clarify how clusterctl overrides works #5818

namnx228 · 2021-12-08T08:53:00Z

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
clusterctl upgrade plan | apply ignores the overrides repository and fetch the metadata.yaml file from upstream.
Here is my overrides structure:

.cluster-api/
├── clusterctl.yaml
├── overrides
│   └── infrastructure-metal3
│       └── v1.0.0
└── version.yaml

clusterctl.yaml:

KUBERNETES_VERSION: v1.22.2
NAMESPACE: metal3
NUM_OF_MASTER_REPLICAS: 3
NUM_OF_WORKER_REPLICAS: 1
NODE_DRAIN_TIMEOUT: 0s
MAX_SURGE_VALUE: 0

## Cluster network
CLUSTER_APIENDPOINT_HOST: 192.168.111.249
CLUSTER_APIENDPOINT_PORT: 6443
POD_CIDR: 192.168.0.0/18
SERVICE_CIDR: 10.96.0.0/12
PROVISIONING_POOL_RANGE_START: 172.22.0.100
PROVISIONING_POOL_RANGE_END: 172.22.0.200
PROVISIONING_CIDR: 24
BAREMETALV4_POOL_RANGE_START: 192.168.111.100
BAREMETALV4_POOL_RANGE_END: 192.168.111.200
BAREMETALV6_POOL_RANGE_START: fd55::100
BAREMETALV6_POOL_RANGE_END: fd55::200
EXTERNAL_SUBNET_V4_PREFIX: 24
EXTERNAL_SUBNET_V4_HOST: 192.168.111.1
EXTERNAL_SUBNET_V6_PREFIX: 64
EXTERNAL_SUBNET_V6_HOST: fd55::1

and this is the debug message from clusterctl upgrade plan:

$ clusterctl upgrade plan -v 9
Using configuration File="/home/ubuntu/.cluster-api/clusterctl.yaml"
Checking cert-manager version...
Cert-Manager is already up to date

Checking new release availability...
Fetching File="metadata.yaml" Provider="cluster-api" Type="CoreProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="kubeadm" Type="BootstrapProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="kubeadm" Type="ControlPlaneProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="cluster-api" Type="CoreProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="metal3" Type="InfrastructureProvider" Version="v0.5.3"

If the metal3 infrastructure provider metadata.yaml is used from the overrides directory, the last message should be:
Using Override="metadata.yaml" Provider="infrastructure-metal3" Version="v1.0.0"

What did you expect to happen:
When setting up the overrides directory for a provider, it is expected that clusterctl upgrade plan | apply would use the metadata.yaml file in this overrides directory instead of fetching it from upstream. This behavior is true with clusterctl v0.4.4. However, it is not true with >=v1.0.0.

Environment:

Cluster-api version: >=v1.0.0
Minikube/KIND version: kind version 0.11.1
Kubernetes version: (use kubectl version):
Client: v1.22.4
Server: v1.22.2
OS (e.g. from /etc/os-release): Ubuntu 20.04.3 LTS
/kind bug
/area upgrade clusterctl
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-12-08T08:53:04Z

@namnx228: The label(s) area/upgrade cannot be applied, because the repository doesn't have them.

In response to this:

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
clusterctl upgrade plan | apply ignores the overrides repository and fetch the metadata.yaml file from upstream.
Here is my overrides structure:
.cluster-api/
├── clusterctl.yaml
├── overrides
│   └── infrastructure-metal3
│       └── v1.0.0
└── version.yaml
clusterctl.yaml:
KUBERNETES_VERSION: v1.22.2
NAMESPACE: metal3
NUM_OF_MASTER_REPLICAS: 3
NUM_OF_WORKER_REPLICAS: 1
NODE_DRAIN_TIMEOUT: 0s
MAX_SURGE_VALUE: 0

## Cluster network
CLUSTER_APIENDPOINT_HOST: 192.168.111.249
CLUSTER_APIENDPOINT_PORT: 6443
POD_CIDR: 192.168.0.0/18
SERVICE_CIDR: 10.96.0.0/12
PROVISIONING_POOL_RANGE_START: 172.22.0.100
PROVISIONING_POOL_RANGE_END: 172.22.0.200
PROVISIONING_CIDR: 24
BAREMETALV4_POOL_RANGE_START: 192.168.111.100
BAREMETALV4_POOL_RANGE_END: 192.168.111.200
BAREMETALV6_POOL_RANGE_START: fd55::100
BAREMETALV6_POOL_RANGE_END: fd55::200
EXTERNAL_SUBNET_V4_PREFIX: 24
EXTERNAL_SUBNET_V4_HOST: 192.168.111.1
EXTERNAL_SUBNET_V6_PREFIX: 64
EXTERNAL_SUBNET_V6_HOST: fd55::1
and this is the debug message from clusterctl upgrade plan:
$ clusterctl upgrade plan -v 9
Using configuration File="/home/ubuntu/.cluster-api/clusterctl.yaml"
Checking cert-manager version...
Cert-Manager is already up to date

Checking new release availability...
Fetching File="metadata.yaml" Provider="cluster-api" Type="CoreProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="kubeadm" Type="BootstrapProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="kubeadm" Type="ControlPlaneProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="cluster-api" Type="CoreProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="metal3" Type="InfrastructureProvider" Version="v0.5.3"
If the metal3 infrastructure provider metadata.yaml is used from the overrides directory, the last message should be:
Using Override="metadata.yaml" Provider="infrastructure-metal3" Version="v1.0.0"

What did you expect to happen:
When setting up the overrides directory for a provider, it is expected that clusterctl upgrade plan | apply would use the metadata.yaml file in this overrides directory instead of fetching it from upstream. This behavior is true with clusterctl v0.4.4. However, it is not true with >=v1.0.0.

Environment:

Cluster-api version: >=v1.0.0

Minikube/KIND version: kind version 0.11.1

Kubernetes version: (use kubectl version):
Client: v1.22.4
Server: v1.22.2

OS (e.g. from /etc/os-release): Ubuntu 20.04.3 LTS
/kind bug
/area upgrade clusterctl
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

namnx228 · 2021-12-08T08:53:29Z

/area upgrades

sbueringer · 2021-12-08T09:38:47Z

I think the clusterctl.yaml has to contain something like this:

providers:
- name: "cluster-api"
  type: "CoreProvider"
  url: "/Users/buringerst/.cluster-api/dev-repository/cluster-api/v1.1.99/core-components.yaml"
- name: "kubeadm"
  type: "BootstrapProvider"
  url: "/Users/buringerst/.cluster-api/dev-repository/bootstrap-kubeadm/v1.1.99/bootstrap-components.yaml"
- name: "kubeadm"
  type: "ControlPlaneProvider"
  url: "/Users/buringerst/.cluster-api/dev-repository/control-plane-kubeadm/v1.1.99/control-plane-components.yaml"
- name: "docker"
  type: "InfrastructureProvider"
  url: "/Users/buringerst/.cluster-api/dev-repository/infrastructure-docker/v1.1.99/infrastructure-components.yaml"
overridesFolder: "/Users/buringerst/.cluster-api/dev-repository/overrides"

(or let's say at least that's how it works in CI and on my machine :))

namnx228 · 2021-12-08T09:48:11Z

@sbueringer I expect it works even when we don't have the provider field in the clusterctl.yaml. With clusterctl <=v1.0.0, it uses the metadata.yaml from overrides repository without the provider field.
One thing I should add is that clusterctl init in my case still use the overrides repository, so only clusterctl upgrade has this issue.

sbueringer · 2021-12-08T09:52:05Z

Ah okay, I wasn't aware of that. Let's see what other folks which know more about clusterctl say

namnx228 · 2021-12-09T08:33:26Z

cc @fabriziopandini

fabriziopandini · 2021-12-13T19:01:22Z

@ykakarap PTAL

ykakarap · 2021-12-14T04:45:37Z

@namnx228 I see that you mentioned that clsuterctl init picked the correct overrides. Can you share more details about this case?
Can you share the version of clusterctl used when running the init command? alos, if you have it can you share the init command you used? did you use specify a version of the provider when performing init?

This will help me in debugging further. Thank you :)

namnx228 · 2021-12-16T17:22:41Z

Hi @ykakarap, sorry for my late answer.
After digging into the source code of clusterctl, here is my finding about how it uses the override directory in case of upgrade. To upgrade a specific provider:

First, it checks if there is any information regarding this provider in clusterctl.yaml (from providers field as in comment above). If provider field is not provided, it checks the information from Github to get a list of versions that have been released.
After that, it checks if the latest version is available in the override directory. If yes, it uses the override one and get the metadata.yaml from here. Otherwise, it fetches the metadata.yaml file from that latest release of that provider on Github.

To sum up, if we put a test version of provider (this version is not in the list of released versions), and we don't specify the use of this version in clusterctl.yaml, clusterctl upgrade will ignore the local override version and fetch the latest release from Github.
This behavior is not really a bug of recent clusterctl version because the code looks like that for a long time (2 years!!!), but I suggest that we should update the doc to make it clear that provider field in clusterctl.yaml is needed to use override directory in upgrade workflow.

fabriziopandini · 2021-12-16T19:09:38Z

@namnx228 thanks for reporting the result of your investigation!

/remove-kind bug
/kind documentation

/retitle clarify how clusterctl overrides works
The point to stress in the doc is the fact that overrides only provide file replacements; instead, provider version resolution is based only on the actual repository structure

fabriziopandini · 2022-01-26T14:10:51Z

/milestone v1.2

k8s-triage-robot · 2022-04-26T14:16:53Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fabriziopandini · 2022-04-27T13:20:24Z

/good-first-issue

We should document the fact that overrides only provide file replacements; instead, provider version resolution is based only on the actual repository structure with a not at the end of https://cluster-api.sigs.k8s.io/clusterctl/configuration.html#overrides-layer

k8s-ci-robot · 2022-04-27T13:20:25Z

@fabriziopandini:
This request has been marked as suitable for new contributors.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

We should document the fact that overrides only provide file replacements; instead, provider version resolution is based only on the actual repository structure with a not at the end of https://cluster-api.sigs.k8s.io/clusterctl/configuration.html#overrides-layer

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mukul-kr · 2022-05-24T14:07:44Z

/assign

k8s-triage-robot · 2022-06-23T14:50:47Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sbueringer · 2022-06-23T14:57:25Z

/remove-lifecycle rotten

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/clusterctl Issues or PRs related to clusterctl labels Dec 8, 2021

k8s-ci-robot added the area/upgrades Issues or PRs related to upgrades label Dec 8, 2021

stmcginnis mentioned this issue Dec 13, 2021

Cluster upgrade 1.23 => latest is broken #5849

Closed

k8s-ci-robot changed the title ~~clusterctl upgrade not use override repository~~ clarify how clusterctl overrides works Dec 16, 2021

k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. and removed kind/bug Categorizes issue or PR as related to a bug. labels Dec 16, 2021

k8s-ci-robot added this to the v1.2 milestone Jan 26, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2022

k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Apr 27, 2022

k8s-ci-robot assigned mukul-kr May 24, 2022

mukul-kr mentioned this issue May 26, 2022

📖 Update document regarding overrides #6551

Merged

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 23, 2022

k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 23, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 23, 2022

k8s-ci-robot closed this as completed in #6551 Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarify how clusterctl overrides works #5818

clarify how clusterctl overrides works #5818

namnx228 commented Dec 8, 2021

k8s-ci-robot commented Dec 8, 2021

namnx228 commented Dec 8, 2021

sbueringer commented Dec 8, 2021 •

edited

Loading

namnx228 commented Dec 8, 2021

sbueringer commented Dec 8, 2021

namnx228 commented Dec 9, 2021

fabriziopandini commented Dec 13, 2021

ykakarap commented Dec 14, 2021

namnx228 commented Dec 16, 2021

fabriziopandini commented Dec 16, 2021

fabriziopandini commented Jan 26, 2022

k8s-triage-robot commented Apr 26, 2022

fabriziopandini commented Apr 27, 2022

k8s-ci-robot commented Apr 27, 2022

mukul-kr commented May 24, 2022

k8s-triage-robot commented Jun 23, 2022

sbueringer commented Jun 23, 2022

clarify how clusterctl overrides works #5818

clarify how clusterctl overrides works #5818

Comments

namnx228 commented Dec 8, 2021

k8s-ci-robot commented Dec 8, 2021

namnx228 commented Dec 8, 2021

sbueringer commented Dec 8, 2021 • edited Loading

namnx228 commented Dec 8, 2021

sbueringer commented Dec 8, 2021

namnx228 commented Dec 9, 2021

fabriziopandini commented Dec 13, 2021

ykakarap commented Dec 14, 2021

namnx228 commented Dec 16, 2021

fabriziopandini commented Dec 16, 2021

fabriziopandini commented Jan 26, 2022

k8s-triage-robot commented Apr 26, 2022

fabriziopandini commented Apr 27, 2022

k8s-ci-robot commented Apr 27, 2022

Guidelines

mukul-kr commented May 24, 2022

k8s-triage-robot commented Jun 23, 2022

sbueringer commented Jun 23, 2022

sbueringer commented Dec 8, 2021 •

edited

Loading