-
Notifications
You must be signed in to change notification settings - Fork 74
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
52 changed files
with
2,012 additions
and
175 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Lifecycle hooks & Finalizers | ||
|
||
The ArangoDB operator expects full control of the `Pods` and `PersistentVolumeClaims` it creates. | ||
Therefore it takes measures to prevent the removal of those resources | ||
until it is safe to do so. | ||
|
||
To achieve this, the server containers in the `Pods` have | ||
a `preStop` hook configured and finalizers are added to the `Pods` | ||
and `PersistentVolumeClaims`. | ||
|
||
The `preStop` hook executes a binary that waits until all finalizers of | ||
the current pod have been removed. | ||
Until this `preStop` hook terminates, Kubernetes will not send a `TERM` signal | ||
to the processes inside the container, which ensures that the server remains running | ||
until it is safe to stop them. | ||
|
||
The operator performs all actions needed when a delete of a `Pod` or | ||
`PersistentVolumeClaims` has been triggered. | ||
E.g. for a dbserver it cleans out the server if the `Pod` and `PersistentVolumeClaim` are being deleted. | ||
|
||
## Lifecycle init-container | ||
|
||
Because the binary that is called in the `preStop` hook is not part of a standard | ||
ArangoDB docker image, it has to be brought into the filesystem of a `Pod`. | ||
This is done by an initial container that copies the binary to an `emptyDir` volume that | ||
is shared between the init-container and the server container. | ||
|
||
## Finalizers | ||
|
||
The ArangoDB operators adds the following finalizers to `Pods`. | ||
|
||
- `dbserver.database.arangodb.com/drain`: Added to DBServers, removed only when the dbserver can be restarted or is completely drained | ||
- `agent.database.arangodb.com/agency-serving`: Added to Agents, removed only when enough agents are left to keep the agency serving | ||
|
||
The ArangoDB operators adds the following finalizers to `PersistentVolumeClaims`. | ||
|
||
- `pvc.database.arangodb.com/member-exists`: removed only when its member exists no longer exists or can be safely rebuild |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
# Pod Eviction & Replacement | ||
|
||
This chapter specifies the rules around evicting pods from nodes and | ||
restarting or replacing them. | ||
|
||
## Eviction | ||
|
||
Eviction is the process of removing a pod that is running on a node from that node. | ||
|
||
This is typically the result of a drain action (`kubectl drain`) or | ||
from a taint being added to a node (either automatically by Kubernetes or manually by an operator). | ||
|
||
## Replacement | ||
|
||
Replacement is the process of replacing a pod by another pod that takes over the responsibilities | ||
of the original pod. | ||
|
||
The replacement pod has a new ID and new (read empty) persistent data. | ||
|
||
Note that replacing a pod is different from restarting a pod. A pod is restarted when it has been reported | ||
to have termined. | ||
|
||
## NoExecute Tolerations | ||
|
||
NoExecute tolerations are used to control the behavior of Kubernetes (wrt. to a Pod) when the node | ||
that the pod is running on is no longer reachable or becomes not-ready. | ||
|
||
See the applicable [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/) for more info. | ||
|
||
## Rules | ||
|
||
The rules for eviction & replacement are specified per type of pod. | ||
|
||
### Image ID Pods | ||
|
||
The Image ID pods are started to fetch the ArangoDB version of a specific | ||
ArangoDB image and fetch the docker sha256 of that image. | ||
They have no persistent state. | ||
|
||
- Image ID pods can always be evicted from any node | ||
- Image ID pods can always be restarted on a different node. | ||
There is no need to replace an image ID pod, nor will it cause problems when | ||
2 image ID pods run at the same time. | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set very low (5sec) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set very low (5sec) | ||
|
||
### Coordinator Pods | ||
|
||
Coordinator pods run an ArangoDB coordinator as part of an ArangoDB cluster. | ||
They have no persistent state, but do have a unique ID. | ||
|
||
- Coordinator pods can always be evicted from any node | ||
- Coordinator pods can always be replaced with another coordinator pod with a different ID on a different node | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set low (15sec) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set low (15sec) | ||
|
||
### DBServer Pods | ||
|
||
DBServer pods run an ArangoDB dbserver as part of an ArangoDB cluster. | ||
It has persistent state potentially tied to the node it runs on and it has a unique ID. | ||
|
||
- DBServer pods can be evicted from any node as soon as: | ||
- It has been completely drained AND | ||
- It is no longer the shard master for any shard | ||
- DBServer pods can be replaced with another dbserver pod with a different ID on a different node when: | ||
- It is not the shard master for any shard OR | ||
- For every shard it is the master for, there is an in-sync follower | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set high to "wait it out a while" (5min) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set high to "wait it out a while" (5min) | ||
|
||
### Agent Pods | ||
|
||
Agent pods run an ArangoDB dbserver as part of an ArangoDB agency. | ||
It has persistent state potentially tight to the node it runs on and it has a unique ID. | ||
|
||
- Agent pods can be evicted from any node as soon as: | ||
- It is no longer the agency leader AND | ||
- There is at least an agency leader that is responding AND | ||
- There is at least an agency follower that is responding | ||
- Agent pods can be replaced with another agent pod with the same ID but wiped persistent state on a different node when: | ||
- The old pod is known to be deleted (e.g. explicit eviction) | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set high to "wait it out a while" (5min) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set high to "wait it out a while" (5min) | ||
|
||
### Single Server Pods | ||
|
||
Single server pods run an ArangoDB server as part of an ArangoDB single server deployment. | ||
It has persistent state potentially tied to the node. | ||
|
||
- Single server pods cannot be evicted from any node. | ||
- Single server pods cannot be replaced with another pod. | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is not set to "wait it out forever" | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is not set "wait it out forever" | ||
|
||
### Single Pods in Active Failover Deployment | ||
|
||
Single pods run an ArangoDB single server as part of an ArangoDB active failover deployment. | ||
It has persistent state potentially tied to the node it runs on and it has a unique ID. | ||
|
||
- Single pods can be evicted from any node as soon as: | ||
- It is a follower of an active-failover deployment (Q: can we trigger this failover to another server?) | ||
- Single pods can always be replaced with another single pod with a different ID on a different node. | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set high to "wait it out a while" (5min) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set high to "wait it out a while" (5min) | ||
|
||
### SyncMaster Pods | ||
|
||
SyncMaster pods run an ArangoSync as master as part of an ArangoDB DC2DC cluster. | ||
They have no persistent state, but do have a unique address. | ||
|
||
- SyncMaster pods can always be evicted from any node | ||
- SyncMaster pods can always be replaced with another syncmaster pod on a different node | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set low (15sec) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set low (15sec) | ||
|
||
### SyncWorker Pods | ||
|
||
SyncWorker pods run an ArangoSync as worker as part of an ArangoDB DC2DC cluster. | ||
They have no persistent state, but do have in-memory state and a unique address. | ||
|
||
- SyncWorker pods can always be evicted from any node | ||
- SyncWorker pods can always be replaced with another syncworker pod on a different node | ||
- `node.kubernetes.io/unreachable:NoExecute` toleration time is set a bit higher to try to avoid resynchronization (1min) | ||
- `node.kubernetes.io/not-ready:NoExecute` toleration time is set a bit higher to try to avoid resynchronization (1min) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
apiVersion: "database.arangodb.com/v1alpha" | ||
kind: "ArangoDeployment" | ||
metadata: | ||
name: "cluster1-with-sync" | ||
spec: | ||
mode: Cluster | ||
image: ewoutp/arangodb:3.3.8 | ||
tls: | ||
altNames: ["kube-01", "kube-02", "kube-03"] | ||
sync: | ||
enabled: true | ||
auth: | ||
clientCASecretName: client-auth-ca | ||
externalAccess: | ||
type: LoadBalancer | ||
loadBalancerIP: 192.168.140.210 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
apiVersion: "database.arangodb.com/v1alpha" | ||
kind: "ArangoDeployment" | ||
metadata: | ||
name: "cluster2-with-sync" | ||
spec: | ||
mode: Cluster | ||
image: ewoutp/arangodb:3.3.8 | ||
tls: | ||
altNames: ["kube-01", "kube-02", "kube-03"] | ||
sync: | ||
enabled: true | ||
auth: | ||
clientCASecretName: client-auth-ca | ||
externalAccess: | ||
type: LoadBalancer | ||
loadBalancerIP: 192.168.140.211 | ||
|
Oops, something went wrong.