Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify how clusterctl overrides works #5818

Closed
namnx228 opened this issue Dec 8, 2021 · 17 comments · Fixed by #6551
Closed

clarify how clusterctl overrides works #5818

namnx228 opened this issue Dec 8, 2021 · 17 comments · Fixed by #6551
Assignees
Labels
area/clusterctl Issues or PRs related to clusterctl area/upgrades Issues or PRs related to upgrades good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation.
Milestone

Comments

@namnx228
Copy link
Contributor

namnx228 commented Dec 8, 2021

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
clusterctl upgrade plan | apply ignores the overrides repository and fetch the metadata.yaml file from upstream.
Here is my overrides structure:

.cluster-api/
├── clusterctl.yaml
├── overrides
│   └── infrastructure-metal3
│       └── v1.0.0
└── version.yaml

clusterctl.yaml:

KUBERNETES_VERSION: v1.22.2
NAMESPACE: metal3
NUM_OF_MASTER_REPLICAS: 3
NUM_OF_WORKER_REPLICAS: 1
NODE_DRAIN_TIMEOUT: 0s
MAX_SURGE_VALUE: 0

## Cluster network
CLUSTER_APIENDPOINT_HOST: 192.168.111.249
CLUSTER_APIENDPOINT_PORT: 6443
POD_CIDR: 192.168.0.0/18
SERVICE_CIDR: 10.96.0.0/12
PROVISIONING_POOL_RANGE_START: 172.22.0.100
PROVISIONING_POOL_RANGE_END: 172.22.0.200
PROVISIONING_CIDR: 24
BAREMETALV4_POOL_RANGE_START: 192.168.111.100
BAREMETALV4_POOL_RANGE_END: 192.168.111.200
BAREMETALV6_POOL_RANGE_START: fd55::100
BAREMETALV6_POOL_RANGE_END: fd55::200
EXTERNAL_SUBNET_V4_PREFIX: 24
EXTERNAL_SUBNET_V4_HOST: 192.168.111.1
EXTERNAL_SUBNET_V6_PREFIX: 64
EXTERNAL_SUBNET_V6_HOST: fd55::1

and this is the debug message from clusterctl upgrade plan:

$ clusterctl upgrade plan -v 9
Using configuration File="/home/ubuntu/.cluster-api/clusterctl.yaml"
Checking cert-manager version...
Cert-Manager is already up to date

Checking new release availability...
Fetching File="metadata.yaml" Provider="cluster-api" Type="CoreProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="kubeadm" Type="BootstrapProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="kubeadm" Type="ControlPlaneProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="cluster-api" Type="CoreProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="metal3" Type="InfrastructureProvider" Version="v0.5.3"

If the metal3 infrastructure provider metadata.yaml is used from the overrides directory, the last message should be:
Using Override="metadata.yaml" Provider="infrastructure-metal3" Version="v1.0.0"

What did you expect to happen:
When setting up the overrides directory for a provider, it is expected that clusterctl upgrade plan | apply would use the metadata.yaml file in this overrides directory instead of fetching it from upstream. This behavior is true with clusterctl v0.4.4. However, it is not true with >=v1.0.0.

Environment:

  • Cluster-api version: >=v1.0.0
  • Minikube/KIND version: kind version 0.11.1
  • Kubernetes version: (use kubectl version):
    Client: v1.22.4
    Server: v1.22.2
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.3 LTS
    /kind bug
    /area upgrade clusterctl
    [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/clusterctl Issues or PRs related to clusterctl labels Dec 8, 2021
@k8s-ci-robot
Copy link
Contributor

@namnx228: The label(s) area/upgrade cannot be applied, because the repository doesn't have them.

In response to this:

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
clusterctl upgrade plan | apply ignores the overrides repository and fetch the metadata.yaml file from upstream.
Here is my overrides structure:

.cluster-api/
├── clusterctl.yaml
├── overrides
│   └── infrastructure-metal3
│       └── v1.0.0
└── version.yaml

clusterctl.yaml:

KUBERNETES_VERSION: v1.22.2
NAMESPACE: metal3
NUM_OF_MASTER_REPLICAS: 3
NUM_OF_WORKER_REPLICAS: 1
NODE_DRAIN_TIMEOUT: 0s
MAX_SURGE_VALUE: 0

## Cluster network
CLUSTER_APIENDPOINT_HOST: 192.168.111.249
CLUSTER_APIENDPOINT_PORT: 6443
POD_CIDR: 192.168.0.0/18
SERVICE_CIDR: 10.96.0.0/12
PROVISIONING_POOL_RANGE_START: 172.22.0.100
PROVISIONING_POOL_RANGE_END: 172.22.0.200
PROVISIONING_CIDR: 24
BAREMETALV4_POOL_RANGE_START: 192.168.111.100
BAREMETALV4_POOL_RANGE_END: 192.168.111.200
BAREMETALV6_POOL_RANGE_START: fd55::100
BAREMETALV6_POOL_RANGE_END: fd55::200
EXTERNAL_SUBNET_V4_PREFIX: 24
EXTERNAL_SUBNET_V4_HOST: 192.168.111.1
EXTERNAL_SUBNET_V6_PREFIX: 64
EXTERNAL_SUBNET_V6_HOST: fd55::1

and this is the debug message from clusterctl upgrade plan:

$ clusterctl upgrade plan -v 9
Using configuration File="/home/ubuntu/.cluster-api/clusterctl.yaml"
Checking cert-manager version...
Cert-Manager is already up to date

Checking new release availability...
Fetching File="metadata.yaml" Provider="cluster-api" Type="CoreProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="kubeadm" Type="BootstrapProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="kubeadm" Type="ControlPlaneProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="cluster-api" Type="CoreProvider" Version="v1.0.2"
Fetching File="metadata.yaml" Provider="metal3" Type="InfrastructureProvider" Version="v0.5.3"

If the metal3 infrastructure provider metadata.yaml is used from the overrides directory, the last message should be:
Using Override="metadata.yaml" Provider="infrastructure-metal3" Version="v1.0.0"

What did you expect to happen:
When setting up the overrides directory for a provider, it is expected that clusterctl upgrade plan | apply would use the metadata.yaml file in this overrides directory instead of fetching it from upstream. This behavior is true with clusterctl v0.4.4. However, it is not true with >=v1.0.0.

Environment:

  • Cluster-api version: >=v1.0.0
  • Minikube/KIND version: kind version 0.11.1
  • Kubernetes version: (use kubectl version):
    Client: v1.22.4
    Server: v1.22.2
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.3 LTS
    /kind bug
    /area upgrade clusterctl
    [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@namnx228
Copy link
Contributor Author

namnx228 commented Dec 8, 2021

/area upgrades

@k8s-ci-robot k8s-ci-robot added the area/upgrades Issues or PRs related to upgrades label Dec 8, 2021
@sbueringer
Copy link
Member

sbueringer commented Dec 8, 2021

I think the clusterctl.yaml has to contain something like this:

providers:
- name: "cluster-api"
  type: "CoreProvider"
  url: "/Users/buringerst/.cluster-api/dev-repository/cluster-api/v1.1.99/core-components.yaml"
- name: "kubeadm"
  type: "BootstrapProvider"
  url: "/Users/buringerst/.cluster-api/dev-repository/bootstrap-kubeadm/v1.1.99/bootstrap-components.yaml"
- name: "kubeadm"
  type: "ControlPlaneProvider"
  url: "/Users/buringerst/.cluster-api/dev-repository/control-plane-kubeadm/v1.1.99/control-plane-components.yaml"
- name: "docker"
  type: "InfrastructureProvider"
  url: "/Users/buringerst/.cluster-api/dev-repository/infrastructure-docker/v1.1.99/infrastructure-components.yaml"
overridesFolder: "/Users/buringerst/.cluster-api/dev-repository/overrides"

(or let's say at least that's how it works in CI and on my machine :))

@namnx228
Copy link
Contributor Author

namnx228 commented Dec 8, 2021

@sbueringer I expect it works even when we don't have the provider field in the clusterctl.yaml. With clusterctl <=v1.0.0, it uses the metadata.yaml from overrides repository without the provider field.
One thing I should add is that clusterctl init in my case still use the overrides repository, so only clusterctl upgrade has this issue.

@sbueringer
Copy link
Member

Ah okay, I wasn't aware of that. Let's see what other folks which know more about clusterctl say

@namnx228
Copy link
Contributor Author

namnx228 commented Dec 9, 2021

cc @fabriziopandini

@fabriziopandini
Copy link
Member

@ykakarap PTAL

@ykakarap
Copy link
Contributor

@namnx228 I see that you mentioned that clsuterctl init picked the correct overrides. Can you share more details about this case?
Can you share the version of clusterctl used when running the init command? alos, if you have it can you share the init command you used? did you use specify a version of the provider when performing init?

This will help me in debugging further. Thank you :)

@namnx228
Copy link
Contributor Author

Hi @ykakarap, sorry for my late answer.
After digging into the source code of clusterctl, here is my finding about how it uses the override directory in case of upgrade. To upgrade a specific provider:

  • First, it checks if there is any information regarding this provider in clusterctl.yaml (from providers field as in comment above). If provider field is not provided, it checks the information from Github to get a list of versions that have been released.
  • After that, it checks if the latest version is available in the override directory. If yes, it uses the override one and get the metadata.yaml from here. Otherwise, it fetches the metadata.yaml file from that latest release of that provider on Github.

To sum up, if we put a test version of provider (this version is not in the list of released versions), and we don't specify the use of this version in clusterctl.yaml, clusterctl upgrade will ignore the local override version and fetch the latest release from Github.
This behavior is not really a bug of recent clusterctl version because the code looks like that for a long time (2 years!!!), but I suggest that we should update the doc to make it clear that provider field in clusterctl.yaml is needed to use override directory in upgrade workflow.

@fabriziopandini
Copy link
Member

@namnx228 thanks for reporting the result of your investigation!

/remove-kind bug
/kind documentation

/retitle clarify how clusterctl overrides works
The point to stress in the doc is the fact that overrides only provide file replacements; instead, provider version resolution is based only on the actual repository structure

@k8s-ci-robot k8s-ci-robot changed the title clusterctl upgrade not use override repository clarify how clusterctl overrides works Dec 16, 2021
@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. and removed kind/bug Categorizes issue or PR as related to a bug. labels Dec 16, 2021
@fabriziopandini
Copy link
Member

/milestone v1.2

@k8s-ci-robot k8s-ci-robot added this to the v1.2 milestone Jan 26, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2022
@fabriziopandini
Copy link
Member

/good-first-issue

We should document the fact that overrides only provide file replacements; instead, provider version resolution is based only on the actual repository structure with a not at the end of https://cluster-api.sigs.k8s.io/clusterctl/configuration.html#overrides-layer

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as suitable for new contributors.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

We should document the fact that overrides only provide file replacements; instead, provider version resolution is based only on the actual repository structure with a not at the end of https://cluster-api.sigs.k8s.io/clusterctl/configuration.html#overrides-layer

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Apr 27, 2022
@mukul-kr
Copy link
Contributor

/assign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 23, 2022
@k8s-ci-robot k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 23, 2022
@sbueringer
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterctl Issues or PRs related to clusterctl area/upgrades Issues or PRs related to upgrades good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants