Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes pre-deploy step having incorrect desired labels and adds a temp fix for race condition #835

Merged
merged 2 commits into from
Jul 22, 2024

Conversation

unmarshall
Copy link
Contributor

@unmarshall unmarshall commented Jul 22, 2024

How to categorize this PR?

/area control-plane
/kind bug

What this PR does / why we need it:
In hotfix branch PreDeploy step was introduced to ensure that etcd-druid is backward compatible with the changes that are made in master w.r.t pod labels and statefulset label-selector. An issue was discovered when running e2e g/g tests.

Desired labels passed to utils.ContainsAllDesiredLabels contains the additional labels which are fetched from etcd.Spec.Labels. When g/g e2e tests are run then in case we upgrade existing etcd cluster from 1 to 3, g/g will add additional networking labels on the etcd resource. These labels will never be present on the pod in the PreDeploy step. This will result in the reconciliation getting stuck at this stage. Deploy step is never called, so the new labels that are added by g/g on the etcd resource never make it to the pods.

NOTE: This is not caught in druid e2e tests because we do not add additional labels to etcd resource when running an upgrade test from 1 to 3 replicas. We should make this change so that these issues can be caught early and not when g/g e2e tests are run.

Additionally we now add one more safeguard in the custodian controller to skip reconciliation if the reconcile annotation is still added on the etcd resource and retry after 10s. This is done in addition to the already present predicate which also does something similar. However the predicate does not prevent requeue of requests due to error in the custodian which can still interfere with the etcd reconciler.

Which issue(s) this PR fixes:
Fixes #836

Special notes for your reviewer:

Release note:

Fixes the labels comparison check done in the PreDeploy step which ensures that the pods have both the old and the new labels.

@unmarshall unmarshall requested a review from a team as a code owner July 22, 2024 10:01
@gardener-robot gardener-robot added area/control-plane Control plane related kind/bug Bug needs/review Needs review size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Jul 22, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jul 22, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jul 22, 2024
@ishan16696
Copy link
Member

/retest

1 similar comment
@ishan16696
Copy link
Member

/retest

@ishan16696
Copy link
Member

/test pull-etcd-druid-e2e-kind-nondistroless-etcd

3 similar comments
@ishan16696
Copy link
Member

/test pull-etcd-druid-e2e-kind-nondistroless-etcd

@ishan16696
Copy link
Member

/test pull-etcd-druid-e2e-kind-nondistroless-etcd

@ishan16696
Copy link
Member

/test pull-etcd-druid-e2e-kind-nondistroless-etcd

Copy link
Member

@ishan16696 ishan16696 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

Copy link
Contributor

@seshachalam-yv seshachalam-yv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Jul 22, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jul 22, 2024
@ishan16696 ishan16696 merged commit 86d8007 into hotfix-v0.22 Jul 22, 2024
11 checks passed
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jul 22, 2024
@ishan16696 ishan16696 deleted the racefix branch July 23, 2024 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Control plane related kind/bug Bug needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/s Size of pull request is small (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants