Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombie GCP Filestore Backups #531

Closed
abalaie opened this issue Aug 24, 2024 · 2 comments · Fixed by #554
Closed

Zombie GCP Filestore Backups #531

abalaie opened this issue Aug 24, 2024 · 2 comments · Fixed by #554
Assignees

Comments

@abalaie
Copy link
Contributor

abalaie commented Aug 24, 2024

Description

During extensive manual tests, I noticed that sometimes multiple backups are getting created for the same CR. The root cause was that, the status was not getting updated because of an error, and therefore in the next reconciler loop, backup was getting attempted again.
As we don't have a reference to these backups, they remain there and won't get cleaned up and cause additional cost without being used.
In addition, if this can happen once, can happen multiple times and eventually we will end up with many unused resources.
And this can happen to all other resources that solely rely on the id of the resource getting updated on the status
Expected result

I expect only one backup resource to get created.

Actual result

Two backups were created on the GCP where only the later one (by few seconds) was referenced by the GcpNfsVolumeBackup CR.

Steps to reproduce

Manually running a test again and again. Please note that this was discovered during one of my manual tests, and could be related to the test conditions I had. But theoretically it can happen as long as we have one line that requests the backup creation and another one that updates the status with its id, which is the only way to reference it from cloud-manager.

Troubleshooting

@abalaie
Copy link
Contributor Author

abalaie commented Aug 24, 2024

I can think of two way to solve this:

  • update the id on the status before hand and use it to create the backup. This way if we already created the backup, the next attempt will error out by saying object already exists.
  • Add labels to the backup that can be used to find the backup if already exist before attempting to create it.

@dushanpantic
Copy link
Contributor

Setting .status.id first, and then using it for name resolution, is how we implemented other resources.
I am voting for the first option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants