Zombie GCP Filestore Backups #531

abalaie · 2024-08-24T00:08:26Z

Description

During extensive manual tests, I noticed that sometimes multiple backups are getting created for the same CR. The root cause was that, the status was not getting updated because of an error, and therefore in the next reconciler loop, backup was getting attempted again.
As we don't have a reference to these backups, they remain there and won't get cleaned up and cause additional cost without being used.
In addition, if this can happen once, can happen multiple times and eventually we will end up with many unused resources.
And this can happen to all other resources that solely rely on the id of the resource getting updated on the status
Expected result

I expect only one backup resource to get created.

Actual result

Two backups were created on the GCP where only the later one (by few seconds) was referenced by the GcpNfsVolumeBackup CR.

Steps to reproduce

Manually running a test again and again. Please note that this was discovered during one of my manual tests, and could be related to the test conditions I had. But theoretically it can happen as long as we have one line that requests the backup creation and another one that updates the status with its id, which is the only way to reference it from cloud-manager.

Troubleshooting

abalaie · 2024-08-24T00:10:12Z

I can think of two way to solve this:

update the id on the status before hand and use it to create the backup. This way if we already created the backup, the next attempt will error out by saying object already exists.
Add labels to the backup that can be used to find the backup if already exist before attempting to create it.

dushanpantic · 2024-08-26T08:11:21Z

Setting .status.id first, and then using it for name resolution, is how we implemented other resources.
I am voting for the first option.

abalaie mentioned this issue Aug 26, 2024

NFS Backup/Restore - GCP #396

Open

8 tasks

abalaie added this to the v1.1 - NFS Backup/Restore GA milestone Aug 26, 2024

abalaie self-assigned this Aug 28, 2024

abalaie mentioned this issue Aug 31, 2024

feat: preventing zombie filestore backups and patching status instead… #554

Merged

kyma-bot closed this as completed in #554 Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zombie GCP Filestore Backups #531

Zombie GCP Filestore Backups #531

abalaie commented Aug 24, 2024

abalaie commented Aug 24, 2024

dushanpantic commented Aug 26, 2024

Zombie GCP Filestore Backups #531

Zombie GCP Filestore Backups #531

Comments

abalaie commented Aug 24, 2024

abalaie commented Aug 24, 2024

dushanpantic commented Aug 26, 2024