Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeadmConfig controller should not rely on previously set status #3134

Closed
fabriziopandini opened this issue Jun 3, 2020 · 19 comments
Closed
Labels
area/bootstrap Issues or PRs related to bootstrap providers help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@fabriziopandini
Copy link
Member

What did you expect to happen:
As per comment #3122 (comment) and #3125 (comment)

The KubeadmConfig controller implements a subset of reconcile logic in the main reconcile loop

switch {
// Wait for the infrastructure to be ready.
case !cluster.Status.InfrastructureReady:
log.Info("Cluster infrastructure is not ready, waiting")
return ctrl.Result{}, nil
// Migrate plaintext data to secret.
case config.Status.BootstrapData != nil && config.Status.DataSecretName == nil:
if err := r.storeBootstrapData(ctx, scope, config.Status.BootstrapData); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{}, patchHelper.Patch(ctx, config)
// Reconcile status for machines that already have a secret reference, but our status isn't up to date.
// This case solves the pivoting scenario (or a backup restore) which doesn't preserve the status subresource on objects.
case configOwner.DataSecretName() != nil && (!config.Status.Ready || config.Status.DataSecretName == nil):
config.Status.Ready = true
config.Status.DataSecretName = configOwner.DataSecretName()
return ctrl.Result{}, patchHelper.Patch(ctx, config)
// Status is ready means a config has been generated.
case config.Status.Ready:
// If the BootstrapToken has been generated for a join and the infrastructure is not ready.
// This indicates the token in the join config has not been consumed and it may need a refresh.
if (config.Spec.JoinConfiguration != nil && config.Spec.JoinConfiguration.Discovery.BootstrapToken != nil) && !configOwner.IsInfrastructureReady() {
token := config.Spec.JoinConfiguration.Discovery.BootstrapToken.Token
remoteClient, err := r.remoteClientGetter(ctx, r.Client, util.ObjectKey(cluster), r.scheme)
if err != nil {
log.Error(err, "error creating remote cluster client")
return ctrl.Result{}, err
}
log.Info("refreshing token until the infrastructure has a chance to consume it")
err = refreshToken(remoteClient, token)
if err != nil {
// It would be nice to re-create the bootstrap token if the error was "not found", but we have no way to update the Machine's bootstrap data
return ctrl.Result{}, errors.Wrapf(err, "failed to refresh bootstrap token")
}
// NB: this may not be sufficient to keep the token live if we don't see it before it expires, but when we generate a config we will set the status to "ready" which should generate an update event
return ctrl.Result{
RequeueAfter: DefaultTokenTTL / 2,
}, nil
}
// In any other case just return as the config is already generated and need not be generated again.
return ctrl.Result{}, nil
}

We should move this to separated reconcile func, and make sure the logic does not rely on previously set status. Instead, it should check for the real state of things. E.g. Instead of checking status for ensuring the secret was created in a previous reconciliation loop, we should check if the secret actually exists.

/kind bug
/area bootstrap

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/bootstrap Issues or PRs related to bootstrap providers labels Jun 3, 2020
@fabriziopandini
Copy link
Member Author

Nb. if we implement lookup for existing secrets, this logic potentially can be removed

if !apierrors.IsAlreadyExists(err) {
return errors.Wrapf(err, "failed to create bootstrap data secret for KubeadmConfig %s/%s", scope.Config.Namespace, scope.Config.Name)
}

@vincepri
Copy link
Member

vincepri commented Jun 5, 2020

/priority backlog
/milestone v0.3.x

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jun 5, 2020
@k8s-ci-robot k8s-ci-robot added this to the v0.3.x milestone Jun 5, 2020
@vincepri
Copy link
Member

vincepri commented Aug 3, 2020

/kind cleanup
/milestone v0.4.0

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Aug 3, 2020
@k8s-ci-robot k8s-ci-robot modified the milestones: v0.3.x, v0.4.0 Aug 3, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 1, 2020
@fabriziopandini
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 1, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2021
@vincepri
Copy link
Member

vincepri commented Feb 1, 2021

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 1, 2021
@fabriziopandini fabriziopandini removed the kind/bug Categorizes issue or PR as related to a bug. label Mar 10, 2021
@CecileRobertMichon
Copy link
Contributor

/help

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Apr 5, 2021
@vincepri
Copy link
Member

/milestone v1.0

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.4, v1.0 Oct 19, 2021
@vincepri vincepri modified the milestones: v1.0, v1.1 Oct 22, 2021
@fabriziopandini fabriziopandini removed this from the v1.1 milestone Feb 3, 2022
@fabriziopandini fabriziopandini added this to the v1.2 milestone Feb 3, 2022
@sbueringer
Copy link
Member

/assign @fabriziopandini

@fabriziopandini
Copy link
Member Author

/unassign

@ramineni
Copy link
Contributor

ramineni commented Jul 1, 2022

/assign

@killianmuldoon
Copy link
Contributor

@ramineni Have you had time to work on this?

@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the v1.2 milestone Jul 29, 2022
@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@ramineni
Copy link
Contributor

ramineni commented Sep 22, 2022

@killianmuldoon sorry for the delay . Got time to get back on this.

Could you help me understand is this issue still relevant now? I see code is changed around

L204: case config.Status.BootstrapData != nil && config.Status.DataSecretName == nil:

this doesn't exist in reconcile loop anymore,

Nb. if we implement lookup for existing secrets, this logic potentially can be removed

Is this the pending item?

Thanks,

@fabriziopandini
Copy link
Member Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Sep 30, 2022
@ramineni ramineni removed their assignment Mar 21, 2023
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 20, 2024
@fabriziopandini
Copy link
Member Author

fabriziopandini commented Apr 22, 2024

The Cluster API project currently lacks enough active contributors to adequately respond to all issues and PRs.

This is more a code smell than something that creates real issues, and the issue is mostly inactive since sept 2022
/Close

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

The Cluster API project currently lacks enough active contributors to adequately respond to all issues and PRs.

This is more a code small than something that creates real issues, and the issue is mostly inactive since sept 2022
/Close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
9 participants