-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The datavolume pointing to an existing virtual machine was deleted by the garbage collector #3134
Comments
Since it is the garbagecollector deleting the DV, it will also delete the PVC since the DV owns it. That makes sense. But for the garbage collector to delete the DV, it must think the owner (the VM) must be deleted as well. Can you check if the deletionTimestamp on the VM resource is set? That is the only reason I can think of that could cause the deletion of the DV by the GC |
Very happy to receive your reply. The deletionTimestamp of VM: evm-cfb1bef9upshljjib6ng is not set |
Okay, well that exhausted my possible ideas. Note that the versions of kubernetes and CDI you are running are very old and we really only support n - 2 so that is 1.58 to 1.56, sometimes for critical issues we will go back a little further. It seems to me you hit a bug in kubernetes here, as I am not aware of anything on our end that would delete datavolumes in the background. What version of KubeVirt are you running with? |
CDI version (use kubectl get deployments cdi-deployment -o yaml): v1.41.0 |
One thing you could try is to set up audit log to see the exact sequence of events: And let me echo again the worry about the old k8s/CDI version, as @awels mentioned, |
Thank you for the enthusiastic discussion @awels @akalenyu, our team has identified the root cause of the problem: We use cluster level CR MigrationPolicy to control the live migration process and set OwnerReference to point to the namespace level VM (kubevirt. io/v1alpha3/VirtualMachine), triggered a bug in 98471 We have changed the usage of MigrationPolicy, removed ownerreference, and no longer have it collected by the garbage collector of the kube-controller-manager. Instead, we have developed a self-developed controller responsible for managing the lifecycle of MigrationPolicy |
What happened:
we have a VirtualMachine and DataVolume, vm yaml as below:
dv as:
Everything is normal until 2024-02-21 02:38:00,the vm: evm-cfb1bef9upshljjib6ng failed, the datavolume pointing to vm was deleted by garbage collector, kube-controller-manager log:
I tried to find the cause of the problem from several directions:
What you expected to happen:
When the virtual machine exists, the datavolume pointed to by the ownerreference of the virtual machine must not be deleted.
How to reproduce it (as minimally and precisely as possible):
Reproducing does not seem to be easy. At least we have not found a stable way.
Additional context:
Before dv was deleted, kube-controller-manager was elected.
Environment:
kubectl get deployments cdi-deployment -o yaml
): v1.41.0kubectl version
): v1.18.19uname -a
): 4.18.0-3.3.el7.x86_64The text was updated successfully, but these errors were encountered: