diff --git a/.prh.yaml b/.prh.yaml index 7c5f727ce1..c0705c8b62 100644 --- a/.prh.yaml +++ b/.prh.yaml @@ -15,11 +15,14 @@ # version: 1 # index: ann +# index: besteffort +# index: burstable # index: clojure # index: docker # index: flamegraph # index: go # index: grafana +# index: guaranteed # index: helm # index: jaeger # index: java @@ -33,6 +36,7 @@ version: 1 # index: pyroscope # index: python # index: qbg +# index: QoS # index: sdk # index: uuid # index: vald @@ -46,6 +50,20 @@ rules: to: ANN options: wordBoundary: true + - pattern: besteffort + expected: BestEffort + specs: + - from: besteffort + to: besteffort + - from: BestEffort + to: BestEffort + - pattern: burstable + expected: Burstable + specs: + - from: burstable + to: burstable + - from: Burstable + to: Burstable - pattern: clojure expected: Clojure options: @@ -80,6 +98,13 @@ rules: expected: Grafana options: wordBoundary: true + - pattern: guaranteed + expected: Guaranteed + specs: + - from: guaranteed + to: guaranteed + - from: Guaranteed + to: Guaranteed - pattern: helm expected: Helm options: @@ -105,7 +130,7 @@ rules: expected: _k_-NN options: wordBoundary: true - - pattern: + - pattern: - kubernetes - k8s - K8s @@ -147,6 +172,10 @@ rules: expected: QBG options: wordBoundary: true + - pattern: qos + expected: QoS + options: + wordBoundary: true - pattern: sdk expected: SDK options: diff --git a/docs/user-guides/capacity-plannig.md b/docs/user-guides/capacity-plannig.md new file mode 100644 index 0000000000..bed730e3db --- /dev/null +++ b/docs/user-guides/capacity-plannig.md @@ -0,0 +1,178 @@ +# Capacity Planning + +## What is capacity planning for the Vald cluster? + +Capacity planning is essential before deploying the Vald cluster to the cloud service. +There are three viewpoints: Vald cluster view, Kubernetes view, and Component view. +Let's see each view. + +## Vald cluster view + +The essential point at the Vald cluster view is the hardware specification, especially RAM. +The Vald cluster, especially Vald Agent components, requires much RAM capacity because the vector index is stored in memory. + +It is easy to figure out the minimum required RAM capacity by the following formula. + +```bash +( { the dimension vector } × { bit number of vector } + { the bit of vectors ID string } ) × { the maximum number of the vector } × { the index replica } +``` + +For example, if you want to insert 1 million vectors with 900 dimensions and the object type is 32-bit with 32 byte (256 bit) ID, and the index replica is 3, the minimum required RAM capacity is: + +```bash +(900 × 32 + 256 ) × 1,000,000 × 3 = 8,7168,000,000 (bit) = 10.896 (GB) +``` + +It is just the minimum required RAM for indexing. +Considering the margin of RAM capacity, the minimum RAM capacity should be less than 60% of the actual RAM capacity. +Therefore, the actual minimum RAM capacity will be: + +```bash +8,7168,000,000 (bit) / 0.6 = 145,280,000,000 (bit) = 18.16 (GB) +``` + +
+In the production usage, memory usage may be not enough in the minimum required RAM.
+E.g., there are a noisy problem, high memory usage for createIndex (indexing on memory), high traffic needs more memory, etc. +
+ +## Kubernetes cluster view + +### Pod priority & QoS + +When the Node capacity (e.g., RAM, CPU) reaches the limit, Kubernetes will decide to kill some Pods according to QoS and Pod priority. +Kubernetes performs pod scheduling with pods Priority Class as the priority and QoS as the second priority. + +**Pod priority** + +Pod priority has the integer value, and the higher value, the higher priority. + +Each Vald component has the default priority value: + +- Agent: 1000000000 +- Discoverer: 1000000 +- Filter Gateway: 1000000 +- LB Gateway: 1000000 +- Index Manager: 1000000 + +Therefore, the order of priority is as follows: + +```bash +Agent > Discoverer = Filter Gateway = LB Gateway = Index Manger +``` + +Those values will be helpful when the Pods other than the Vald component are in the same Node. + +It is easy to change by editing your `values.yaml`. + +```yaml +# e.g. LB Gateway podPriority settings. +... + gateway: + lb: + ... + podPriority: + enabled: true + value: {new values} + ... +``` + +**QoS** + +QoS value can be either Guaranteed, Burstable, or BestEffort. +And, QoS priority is higher in the order of Guaranteed, Burstable, BestEffort, and Kubernetes will kill Pods in ascending order of importance. + +Resource request and limit determine QoS. + +The below table shows the condition for each QoS. + +| QoS | request CPU | request Memory | limit CPU | request Memory | Sup. | +| :--------: | :-------------: | :------------: | :-------------: | :-------------: | :---------------------------------- | +| Guaranteed | :o: | :o: | :o: | :o: | All settings are required. | +| Burstable | :o: (:warning:) | :o: (:warning) | :o: (:warning:) | :o: (:warning:) | One to three settings are required. | +| BestEffort | :x: | :x: | :x: | :x: | No setting is required. | + +Vald requires many RAM resources because of on-memory indexing, so we highly recommend that you do not specify a limit, especially for the Vald Agent. +In this case, QoS will be Burstable. + +**Throttling** + +The CPU throttling affects the pod performance. + +If it occurs, the Vald cluster operator must consider each component's CPU resource request and limit. +It is easy to change by editing the `values.yaml` file and applying it. + +```yaml +# e.g. LB Gateway resources settings. +... + gateway: + lb: + ... + resources: + requests: + cpu: 200m + memory: 150Mi + limits: + cpu: 2000m + memory: 700Mi + ... +``` + +
+Please take care of pod priority and QoS. +
+ +### Node & Pod affinity + +Kubernetes scheduler often applies Pods on Node based on resource availability. + +In production usage, other components sometimes work on the Kubernetes cluster where the Vald cluster runs. +Depending on the situation, you may want to deploy to a different Node: e.g., when running a machine learning component that requires high memory on an independent Node. + +In this situation, we recommend you to set the affinity/anti-affinity configuration for each Vald component. +It is easy to change by editing each component setting on your `values.yaml`. + +
+The affinity setting for Vald Agent is the significant for the Vald cluster.
+Please DO NOT remove the default settings. +
+ +```yaml +# e.g. Agent's affinity settings +... + agent: + ... + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: [] + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: [] + podAffinity: + preferredDuringSchedulingIgnoredDuringExecution: [] + requiredDuringSchedulingIgnoredDuringExecution: [] + podAntiAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + podAffinityTerm: + topologyKey: kubernetes.io/hostname + labelSelector: + matchExpressions: + - key: app + operator: In + values: + - vald-agent-ngt + requiredDuringSchedulingIgnoredDuringExecution: [] + ... +``` + +For more information about Kubernetes affinity, please refer to [here](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity) + +## Component view + +Depending on the customization of each component for each user, there are some points to be aware of. + +**Index Manager** + +If the `saveIndex` is executed frequently, the backup data per unit time will increase, which consumes bandwidth. + +Similarly, as the saveIndex concurrency increases, the backup data per unit time increases. diff --git a/example/helm/values.yaml b/example/helm/values.yaml index f514ade125..d923d80502 100644 --- a/example/helm/values.yaml +++ b/example/helm/values.yaml @@ -17,7 +17,7 @@ defaults: logging: level: debug image: - tag: "latest" + tag: "nightly" server_config: healths: liveness: