K8OP-294 Do not try to work out backup status if there are no pods #1462

rzvoncek · 2024-11-27T17:20:49Z

What this PR does: In the MedusaBackupJob controller, we add a check for pods being present. Pods might be absent for temporary reasons, such as nodepool replacement in the linked issue. If we find no pods, we cannot find the related backups (statuses) and crash. Since the pods are likely to come back, we do not stop requeing the reconciliation.
However, if we get the pods, but for some other reason we don't get the backup statuses, something worse happened and we can't recover. So we stop requeing.

Which issue(s) this PR fixes:
Fixes #1454

Checklist

Changes manually tested
Automated Tests added/updated
Documentation added/updated
CHANGELOG.md updated (not required for documentation PRs)
CLA Signed: DataStax CLA

burmanm · 2024-11-28T15:41:01Z

controllers/medusa/medusabackupjob_controller.go

@@ -103,6 +103,10 @@ func (r *MedusaBackupJobReconciler) Reconcile(ctx context.Context, req ctrl.Requ
 		logger.Error(err, "Failed to get datacenter pods")
 		return ctrl.Result{}, err
 	}
+	if len(pods) == 0 {


This is related to the GetCassandraDatacenterPods..

pods := make([]corev1.Pod, 0) pods = append(pods, podList.Items...)

Why not simply return podList.Items ? What's the point of making a new slice?

For this method, if there's no pods, why are we requeueing? Do we assume that pods will reappear?

I don't know why we re-create the slice. It makes sense to not do it. I pushed the commit fixing this.

Yes, we are re-queing because the assumption is the pods will indeed reappear. In the initial ticket, they were replacing nodepools, and they eventually came back. Checking how precisely this works might deserve at least a manual test, and perhaps a follow up ticket.

controllers/medusa/medusabackupjob_controller.go

controllers/medusa/medusabackupjob_controller_test.go

sonarcloud · 2024-12-03T10:45:29Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

rzvoncek force-pushed the K8OP-294 branch from 47d0b95 to 70568c9 Compare November 28, 2024 09:43

rzvoncek changed the title ~~K8OP-294 WIP reproduce no-pods situation in envtests + ensure it does not crash the operator~~ K8OP-294 Do not try to work out backup status if there are no pods Nov 28, 2024

rzvoncek marked this pull request as ready for review November 28, 2024 09:48

rzvoncek requested a review from a team as a code owner November 28, 2024 09:48

K8OP-294 Do not try to work out backup status if there are no pods

ed2a660

rzvoncek force-pushed the K8OP-294 branch from 70568c9 to ed2a660 Compare November 28, 2024 09:49

Fix the medusa task controller test

417b98a

burmanm reviewed Nov 28, 2024

View reviewed changes

controllers/medusa/medusabackupjob_controller.go Outdated Show resolved Hide resolved

burmanm reviewed Nov 28, 2024

View reviewed changes

controllers/medusa/medusabackupjob_controller_test.go Show resolved Hide resolved

burmanm reviewed Nov 28, 2024

View reviewed changes

controllers/medusa/medusabackupjob_controller_test.go Outdated Show resolved Hide resolved

burmanm reviewed Nov 28, 2024

View reviewed changes

controllers/medusa/medusabackupjob_controller_test.go Outdated Show resolved Hide resolved

rzvoncek added 8 commits November 29, 2024 11:16

Do not duplicate the slice with pods

ca96675

Return an error, not nil, if backupsummary is not found

7462772

Replace int32 with int when deleting pods in test

d5607c6

Return nil backup summary for backup with no pods

d9b53f5

Remove get before delete in medusabackupjob_controller_test

5e1f31a

Use a dedicated test backup for the nil backupSummary case

fc97af9

Return reconcile.TerminalError if getBackupSummary originates the error

d04f2ed

Expect backup with nil summary to actually start

3ae146a

rzvoncek force-pushed the K8OP-294 branch from f003b04 to 3ae146a Compare December 3, 2024 10:44

burmanm approved these changes Dec 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8OP-294 Do not try to work out backup status if there are no pods #1462

K8OP-294 Do not try to work out backup status if there are no pods #1462

rzvoncek commented Nov 27, 2024 •

edited

Loading

burmanm Nov 28, 2024

rzvoncek Nov 29, 2024

sonarcloud bot commented Dec 3, 2024

K8OP-294 Do not try to work out backup status if there are no pods #1462

Are you sure you want to change the base?

K8OP-294 Do not try to work out backup status if there are no pods #1462

Conversation

rzvoncek commented Nov 27, 2024 • edited Loading

burmanm Nov 28, 2024

Choose a reason for hiding this comment

rzvoncek Nov 29, 2024

Choose a reason for hiding this comment

sonarcloud bot commented Dec 3, 2024

Quality Gate passed

rzvoncek commented Nov 27, 2024 •

edited

Loading