[BUG] PVC can be mistakenly deleted when reading from stale apiserver #312

srteam2020 · 2021-03-15T15:40:48Z

Description

We find that zookeeper-operator could accidentally delete the PVC of the zookeeper pod when zookeeper-operator experiences a restart and talks to one stale apiserver in an HA k8s cluster. After several rounds of inspection, we find the root cause is the staleness of the events fed from the apiserver to the controller.

More concretely, the controller performs reconcile() according to the events watched from apiserver, and the watch is served by the apiserver’s local cache without any strong consistency guarantee. If that apiserver's watch cache is stale (due to some network partition issue), the controller will perform reconcile() according to the stale events from the apiserver, which can lead to failures.

Particularly, if the controller receives a stale update event indicating "ZookeeperCluster has non-zero deletion timestamp" from the stale apiserver, it will believe a deletion is happening and decides to delete the PVC of that ZookeeperCluster. Note that such a stale event actually comes from the deletion of a previous ZookeeperCluster that has a different UID than the currently running one. However, the controller lists all related PVC only using the name and namespace:

func (r *ReconcileZookeeperCluster) getPVCList(instance *zookeeperv1beta1.ZookeeperCluster) (pvList corev1.PersistentVolumeClaimList, err error) {
	selector, err := metav1.LabelSelectorAsSelector(&metav1.LabelSelector{
		MatchLabels: map[string]string{"app": instance.GetName()},
	})
	pvclistOps := &client.ListOptions{
		Namespace:     instance.Namespace,
		LabelSelector: selector,
	}
	pvcList := &corev1.PersistentVolumeClaimList{}
	err = r.client.List(context.TODO(), pvcList, pvclistOps)
	return *pvcList, err
}

If the currently running ZookeeperCluster uses the same name as the previously deleted one (e.g., we delete and recreate a ZookeeperCluster), the controller will mistakenly think the listed PVCs belong to the "deleted ZookeeperCluster", and invokes deletePVC to delete the PVC.

We list concrete reproduction steps as below:

Run the controller in a HA k8s cluster and create a ZookeeperCluster zkc. The controller is talking to apiserver1 (which is not stale).
Delete zkc. Apiserver1 will send the update events with a non-zero deletion timestamp to the controller and the controller will delete the related PVC of zkc during reconcile. Meanwhile, apiserver2 is partitioned so its watch cache stops at the moment that zkc is tagged with a deletion timestamp.
Create the ZookeeperCluster with the same name again. Now the ZookeeperCluster and its PVC gets back. However, apiserver2 still holds the stale view that zkc has a non-zero deletion timestamp and is about to be deleted.
The controller crashes due to some node failure and restarts (or its follower becomes leader). This time the controller talks to the stale apiserver2. The restarted controller receives the stale update events from apiserver2 that zkc has a deletion timestamp. Since the controller only uses the name(space) to list the PVC, all the PVCs belonging to the newly created ZookeeperCluster will be listed and deleted.

Importance

blocker: This bug makes the controller perform unexpected deletion when reading stale data from the apiserver, and can further lead to data loss or availability issues.

Location

zookeepercluster_controller.go

Suggestions for an improvement

We are willing to issue a PR to help fix this.
The bug can be fixed by tagging each PVC with the UID of the ZookeeperCluster in MakeStatefulSet, and list PVC using UID. Each ZookeeperCluster instance always has a different UID even with the same name, so PVCs belonging to the current ZookeeperCluster will not be deleted by events of the old ZookeeperCluster. We can issue a PR to add UID as a label for each PVC.

The text was updated successfully, but these errors were encountered:

anishakj · 2021-03-17T05:19:54Z

@srteam2020 Thanks for finding the issue. Please go ahead and raise PR as you know how to solve this issue.

srteam2020 · 2021-03-17T15:08:38Z

@anishakj Yes, thank you for confirming the issue. We will raise a PR for it.

srteam2020 mentioned this issue Mar 20, 2021

Issue 312: list PVC with ZookeeperCluster uid #313

Merged

anishakj closed this as completed in #313 Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] PVC can be mistakenly deleted when reading from stale apiserver #312

[BUG] PVC can be mistakenly deleted when reading from stale apiserver #312

srteam2020 commented Mar 15, 2021 •

edited

Loading

anishakj commented Mar 17, 2021

srteam2020 commented Mar 17, 2021

[BUG] PVC can be mistakenly deleted when reading from stale apiserver #312

[BUG] PVC can be mistakenly deleted when reading from stale apiserver #312

Comments

srteam2020 commented Mar 15, 2021 • edited Loading

Description

Importance

Location

Suggestions for an improvement

anishakj commented Mar 17, 2021

srteam2020 commented Mar 17, 2021

srteam2020 commented Mar 15, 2021 •

edited

Loading