Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8OP-294 Do not try to work out backup status if there are no pods #1462

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rzvoncek
Copy link
Contributor

@rzvoncek rzvoncek commented Nov 27, 2024

What this PR does: In the MedusaBackupJob controller, we add a check for pods being present. Pods might be absent for temporary reasons, such as nodepool replacement in the linked issue. If we find no pods, we cannot find the related backups (statuses) and crash. Since the pods are likely to come back, we do not stop requeing the reconciliation.
However, if we get the pods, but for some other reason we don't get the backup statuses, something worse happened and we can't recover. So we stop requeing.

Which issue(s) this PR fixes:
Fixes #1454

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CHANGELOG.md updated (not required for documentation PRs)
  • CLA Signed: DataStax CLA

@rzvoncek rzvoncek changed the title K8OP-294 WIP reproduce no-pods situation in envtests + ensure it does not crash the operator K8OP-294 Do not try to work out backup status if there are no pods Nov 28, 2024
@rzvoncek rzvoncek marked this pull request as ready for review November 28, 2024 09:48
@rzvoncek rzvoncek requested a review from a team as a code owner November 28, 2024 09:48
@@ -103,6 +103,10 @@ func (r *MedusaBackupJobReconciler) Reconcile(ctx context.Context, req ctrl.Requ
logger.Error(err, "Failed to get datacenter pods")
return ctrl.Result{}, err
}
if len(pods) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the GetCassandraDatacenterPods..

	pods := make([]corev1.Pod, 0)
	pods = append(pods, podList.Items...)

Why not simply return podList.Items ? What's the point of making a new slice?

For this method, if there's no pods, why are we requeueing? Do we assume that pods will reappear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we re-create the slice. It makes sense to not do it. I pushed the commit fixing this.

Yes, we are re-queing because the assumption is the pods will indeed reappear. In the initial ticket, they were replacing nodepools, and they eventually came back. Checking how precisely this works might deserve at least a manual test, and perhaps a follow up ticket.

Copy link

sonarcloud bot commented Dec 3, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

k8ssandra-operator in CrashLoopBackOff during Kubernetes upgrade
2 participants