Since v0.2.7
Solr Clouds are complex distributed systems, and thus require a more delicate and informed approach to rolling updates.
If the Managed
update strategy is specified in the Solr Cloud CRD, then the Solr Operator will take control over deleting SolrCloud pods when they need to be updated.
The operator will find all pods that have not been updated yet and choose the next set of pods to delete for an update, given the following workflow.
Note: Managed Updates are a executed via Cluster Operation Locks, please refer to the documentation for more information about how these operations are executed.
The logic goes as follows:
- Find the pods that are out-of-date
- Update all out-of-date pods that do not have a started Solr container.
- This allows for updating a pod that cannot start, even if other pods are not available.
- This step does not respect the
maxPodsUnavailable
option, because these pods have not even started the Solr process.
- Retrieve the cluster state of the SolrCloud if there are any
ready
pods.- If no pods are ready, then there is no endpoint to retrieve the cluster state from.
- Sort the pods in order of safety for being restarted. Sorting order reference
- Iterate through the sorted pods, greedily choosing which pods to update. Selection logic reference
- The maximum number of pods that can be updated are determined by starting with
maxPodsUnavailable
, then subtracting the number of updated pods that are unavailable as well as the number of not-yet-started, out-of-date pods that were updated in a previous step. This check makes sure that any pods taken down during this step do not violate themaxPodsUnavailable
constraint.
- The maximum number of pods that can be updated are determined by starting with
The pods are sorted by the following criteria, in the given order. If any two pods on a criterion, then the next criteria (in the following order) is used to sort them.
In this context the pods sorted highest are the first chosen to be updated, the pods sorted lowest will be selected last.
- If the pod is the overseer, it will be sorted lowest.
- If the pod is not represented in the clusterState, it will be sorted highest.
- A pod is not in the clusterstate if it does not host any replicas and is not the overseer.
- Number of leader replicas hosted in the pod, sorted low -> high
- Number of active or recovering replicas hosted in the pod, sorted low -> high
- Number of total replicas hosted in the pod, sorted low -> high
- If the pod is not a liveNode, then it will be sorted lower.
- Any pods that are equal on the above criteria will be sorted lexicographically.
Loop over the sorted pods, until the number of pods selected to be updated has reached the maximum.
This maximum is calculated by taking the given, or default, maxPodsUnavailable
and subtracting the number of updated pods that are unavailable or have yet to be re-created.
- If the pod is the overseer, then all other pods must be updated and available. Otherwise, the overseer pod cannot be updated.
- If the pod contains no replicas, the pod is chosen to be updated.
WARNING: If you use Solr worker nodes for streaming expressions, you will likely want to setmaxPodsUnavailable
to a value you are comfortable with. - If Solr Node of the pod is not
live
, the pod is chosen to be updated. - If all replicas in the pod are in a
down
orrecovery_failed
state, the pod is chosen to be updated. - If the taking down the replicas hosted in the pod would not violate the given
maxShardReplicasUnavailable
, then the pod can be updated. Once a pod with replicas has been chosen to be updated, the replicas hosted in that pod are then considered unavailable for the rest of the selection logic.- Some replicas in the shard may already be in a non-active state, or may reside on Solr Nodes that are not "live".
The
maxShardReplicasUnavailable
calculation will take these replicas into account, as a starting point. - If a pod contains non-active replicas, and the pod is chosen to be updated, then the pods that are already non-active will not be double counted for the
maxShardReplicasUnavailable
calculation.
- Some replicas in the shard may already be in a non-active state, or may reside on Solr Nodes that are not "live".
The
Given these complex requirements, kubectl rollout restart statefulset
will generally not work on a SolrCloud.
One option to trigger a manual restart is to change one of the podOptions annotations. For example you could set this to the date and time of the manual restart.
apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
spec:
customSolrKubeOptions:
podOptions:
annotations:
manualrestart: "2021-10-20T08:37:00Z"
The Solr Operator sets up at least two Services for every SolrCloud.
- Always
- A clusterIP service for all solr nodes (What we call the "common service")
- Either
- A Headless Service for individual Solr Node endpoints that are not exposed via an Ingress.
- Individual clusterIP services for Solr Nodes that are exposed via an Ingress
Only the common service uses the publishNotReadyAddresses: false
option, since the common service should load balance between all available nodes.
The other services have individual endpoints for each node, so there is no reason to de-list pods that are not available.
When doing a rolling upgrade, or taking down a pod for any reason, we want to first stop all requests to this pod.
Solr will do this while stopping by first taking itself out of the cluster's set of liveNodes
, so that other Solr nodes and clients think it is not running.
However, for ephemeral clusters we are also evicting data before the pod is deleted. So we want to stop requests to this node since the data will soon no-longer live there.
Kubernetes allows the Solr Operator to control whether a pod is considered ready
, or available for requests, via readinessConditions/readinessGates.
When the Solr Operator begins the shut-down procedure for a pod, it will first set a readinessCondition
to false
, so that no loadBalanced requests (through the common service) go to the pod.
This readinessCondition will stay set to false
until the pod is deleted and a new pod is created in its place.
For this reason, it's a good idea to avoid very aggressive Update Strategies.
During a rolling restart with a high maxPodsUnavailable
, requests that go through the common service might be routed to a very small number of pods.