Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cassandra node can be decommissioned wrongly which blocks scale down #400

Closed
srteam2020 opened this issue Jan 14, 2021 · 3 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@srteam2020
Copy link
Contributor

Describe the bug

We find that sometimes the cassandraDatacenter controller fails to scale down the Cassandra cluster because it decommissions the Cassandra node wrongly. In scaleStatefulSet the last pod returned by List is picked as the one to decommission:

newestPod := podsInRack[len(podsInRack)-1]

The List fetches the pods from the local indexer. However, the indexer is implemented as a map[string]interface{} so no order is guaranteed for the keys in the indexer. If there are two Cassandra pods ca-0 and ca-1 running, the List can either return [ca-0, ca-1] or [ca-1, ca-0]. If the latter, the controller will decommission ca-0. Later when the Kubernetes statefulset controller tries to reconcile the statefulset, it will choose the pod with the largest ordinal to delete which is ca-1 in this case. So the decommissioning could be inconsistent with the pod deletion. More importantly, the statefulset controller only deletes the pod when all predecessor pods are in good status ("Running and Ready"). Since ca-0's Cassandra node is decommissioned, the deletion of ca-1 will be blocked forever.

To Reproduce

  1. Create a cassandradatecenter cdc with node=2 (there are two pods: ca-0 and ca-1 now)
  2. Scale cdc down by changing node to 1. Now if the List happens to return [ca-1, ca-0], ca-0 will be decommissioned and we will observe the deletion of ca-1 gets stuck forever and the scale down will never succeed.

Note that the bug is nondeterministic since the order in map is not guaranteed. if [ca-0, ca-1] is returned then everything is fine. But sometimes we indeed observe different orders get returned and cause the problem mentioned above.

Expected behavior

A potential fix is to use the same way in statefulset controller to get the ordinal of each pod just like below and pick the pod with the largest ordinal to decommission.

Code in statefulset controller to extract the ordinal for a pod:

// getParentNameAndOrdinal gets the name of pod's parent StatefulSet and pod's ordinal as extracted from its Name. If
// the Pod was not created by a StatefulSet, its parent is considered to be empty string, and its ordinal is considered
// to be -1.
func getParentNameAndOrdinal(pod *v1.Pod) (string, int) {
	parent := ""
	ordinal := -1
	subMatches := statefulPodRegex.FindStringSubmatch(pod.Name)
	if len(subMatches) < 3 {
		return parent, ordinal
	}
	parent = subMatches[1]
	if i, err := strconv.ParseInt(subMatches[2], 10, 32); err == nil {
		ordinal = int(i)
	}
	return parent, ordinal
}

Environment

  • OS Linux
  • Kubernetes version v1.18.9
  • kubectl version v1.20.1
  • Go version 1.13.9
  • Cassandra version 3

Additional context
We are willing to file a patch for this issue.

@srteam2020 srteam2020 added the bug Something isn't working label Jan 14, 2021
@smiklosovic
Copy link
Collaborator

smiklosovic commented Jan 14, 2021

hi @srteam2020

yes, the patch would be very good! I am very sorry this one slipped through. If you have some cycles to fix this it would be awesome.

I have not forgotten your first patch, I am just doing something in the background so I will cut a release sooner or later as what I do depends on operator but I ve already merged that locally. If you manage to fix this, I will release the images with both issues.

Regards

@srteam2020
Copy link
Contributor Author

PR issued here #401
Borrowing the code from statefulset controller

@smiklosovic
Copy link
Collaborator

merged / release thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants