Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove experimentalRetryJoin #6942

Open
killianmuldoon opened this issue Jul 18, 2022 · 26 comments
Open

Remove experimentalRetryJoin #6942

killianmuldoon opened this issue Jul 18, 2022 · 26 comments
Assignees
Labels
area/bootstrap Issues or PRs related to bootstrap providers area/control-plane Issues or PRs related to control-plane lifecycle management help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@killianmuldoon
Copy link
Contributor

killianmuldoon commented Jul 18, 2022

experimentalRetryJoin is deprecated as of CAPI release 1.2. Removal is expected to happen in release 1.3 - but before proceeding we should have high confidence that CAPI + Kubeadm are capable today of solving the problem that experimentalRetryJoin was designed to fix.

In order to increase confidence in the removal we should:

  1. Ask providers which, if any, are still using experimentalRetryJoin, to what extent, and to solve what problem.
  2. Figure out a testing strategy for providers that would trigger the problem experimentalRetryJoin was designed to fix.
  3. Ensure these tests are passing and we have high confidence in the current retryJoin behaviour of Kubeadm.

Once we're confident that we can proceed with removal we need to:

  1. Remove the field from the bootstrap KubeadmConfig types + regenerate CRDs
  2. Stop embedding the kubeadm-bootsrap-script in bootstrap/kubeadm/internal/cloudinit/cloudinit.go
  3. Update cloudinit.go, removing functions and variables related to the embedded script e.g. generateBootstrapScript, kubeadmBootstrapScript
  4. Remove the KubeadmBootstrap script
  5. Add a note to the migration guide that the

/area bootstrap

@k8s-ci-robot k8s-ci-robot added the area/control-plane Issues or PRs related to control-plane lifecycle management label Jul 18, 2022
@sbueringer
Copy link
Member

sbueringer commented Jul 18, 2022

In my opinion we should never remove a field from a v1beta1 apiVersion without an apiVersion bump as it's a breaking change to v1beta1 otherwise.

@killianmuldoon
Copy link
Contributor Author

In my opinion we should never remove a field from a v1beta1 apiVersion without an apiVersion bump as it's a breaking change to v1beta1 otherwise.

Good point, but I suppose that doesn't necessarily preclude us from changing the behaviour. It's not exactly a feature gate, but the field is prefixed with experimental

@sbueringer
Copy link
Member

sbueringer commented Jul 18, 2022

So we would keep the field but disable its behavior? I think that's a very dangerous precedent and ~ the same as dropping the field.

@killianmuldoon
Copy link
Contributor Author

So we would keep the field but disable its behavior? I think that's a very dangerous precedent and ~ the same as dropping the field.

This is done regularly for experimental features as I understand it. The key question is whether we consider this experimental or not given the prefix. I think the most important thing here is whether the actual behaviour is fixed rather than the API fields and behaviour though.

@k8s-ci-robot k8s-ci-robot added the area/bootstrap Issues or PRs related to bootstrap providers label Jul 18, 2022
@killianmuldoon
Copy link
Contributor Author

Maybe an alternative path to deprecating this is to add a feature flag and a binary flag to enable this behaviour while deprecating and eventually removing the experimental API field. This would bring the way we enable the feature in line with our other experimental features like ClusterClass and MachinePools.

@fabriziopandini fabriziopandini added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 29, 2022
@vincepri
Copy link
Member

/kind api-change

As mentioned above, we need to deprecate now (just a comment on the field would suffice) + remove in the next API version

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Sep 26, 2022
@vincepri
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Sep 26, 2022
@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 26, 2022

/milestone v1.3

we should get this done before the release

@k8s-ci-robot k8s-ci-robot added this to the v1.3 milestone Sep 26, 2022
@fabriziopandini
Copy link
Member

/good-first-issue

@k8s-ci-robot k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Sep 26, 2022
@sbueringer
Copy link
Member

If I'm not mistaken deprecation was done in these PRs:

It was already part of v1.2.0 (https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.2.0).

I wonder a bit about removal though.

I don't have background information for what exactly this feature was needed and why we can drop it now (either because it's not needed anymore or we have a replacement I guess (?)).

Additional indication for that:

@yastij: why did we drop UseExperimentalRetryJoin it's still needed in slow/flaky envs
kubernetes-sigs/cluster-api-provider-vsphere#1602 (comment)

@killianmuldoon
Copy link
Contributor Author

I think we should remove the good first issue and help wanted labels, as this removal isn't something easily implemented. @fabriziopandini WDYT?

@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 27, 2022

If I got this right what we are doing now is just applying a deprecated label on the API field (I have updated the issue title)
actual deprecation will happen in the next cycle

@fabriziopandini fabriziopandini changed the title Remove experimentalRetryJoin Deprecate experimentalRetryJoin Sep 27, 2022
@sbueringer
Copy link
Member

If I'm not mistaken deprecation was done in these PRs:

It was already part of v1.2.0 (https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.2.0).

I wonder a bit about removal though.

I don't have background information for what exactly this feature was needed and why we can drop it now (either because it's not needed anymore or we have a replacement I guess (?)).

Additional indication for that:

@yastij: why did we drop UseExperimentalRetryJoin it's still needed in slow/flaky envs
kubernetes-sigs/cluster-api-provider-vsphere#1602 (comment)

I'm slightly confused. As mentioned, I think we did this already?

@fabriziopandini
Copy link
Member

Ok I have missed this. reset.

@fabriziopandini
Copy link
Member

/remove-good-first-issue

@k8s-ci-robot k8s-ci-robot removed the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Sep 27, 2022
@fabriziopandini fabriziopandini changed the title Deprecate experimentalRetryJoin Remove experimentalRetryJoin Sep 27, 2022
@fabriziopandini fabriziopandini removed this from the v1.3 milestone Nov 2, 2022
@fabriziopandini
Copy link
Member

Dropping to the milestone
@yastij @randomvariable what is your opinion about making this happen in 1.4 timeframe (target date ~march 2023)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023
@sbueringer
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 8, 2024
@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 12, 2024
@sbueringer
Copy link
Member

Just a note. Just checked linked issues and talked to some folks. We should be really able to remove the field as there were various improvements to kubeadm.

Also in the worst case were folks still need it they can remove the kubeadm binary through a script and implement their own retry handling there.

@neolit123
Copy link
Member

Just a note. Just checked linked issues and talked to some folks. We should be really able to remove the field as there were various improvements to kubeadm.

Also in the worst case were folks still need it they can remove the kubeadm binary through a script and implement their own retry handling there.

+1, i though this was removed already. :\

@neolit123
Copy link
Member

i see there are steps in the OP, i can give this a shot.
/assign

@sbueringer
Copy link
Member

sbueringer commented Apr 25, 2024

We have to wait for the next apiVersion (it's deprecated already)

I just additionally talked to some folks if they still depend on it

@fabriziopandini
Copy link
Member

based on the comment above
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 29, 2024
@chrischdi
Copy link
Member

Note: #10983 gives another indicator that we should not re-add this in v1beta2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers area/control-plane Issues or PRs related to control-plane lifecycle management help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

8 participants