From 3477053cdc3d656191a562bc47cc04b2a002291d Mon Sep 17 00:00:00 2001 From: vankichi Date: Mon, 1 Aug 2022 14:28:09 +0900 Subject: [PATCH 1/8] :pencil: add backup configuration document Signed-off-by: vankichi --- docs/user-guides/backup-configuration.md | 82 ++++++++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 docs/user-guides/backup-configuration.md diff --git a/docs/user-guides/backup-configuration.md b/docs/user-guides/backup-configuration.md new file mode 100644 index 0000000000..2d6e6fbff1 --- /dev/null +++ b/docs/user-guides/backup-configuration.md @@ -0,0 +1,82 @@ +# Backup configuration + +There are three types of options for the Vald cluster: backup, filtering, and core algorithms. + +This page describes how enabled backup features for your Vald cluster. + +## What is the backup + +Vald's backup function is to save the index information indexed in each Vald Agent pod as file data in Persistent Volume or S3. +When the Vald Agent pod is restarted for some reason, the index state is restored from the saved index data. + +## Backup configuration + +This section shows the best practice backup configuration with PV, S3, or PV + S3. + +Each sample configuration yaml is published on [here](https://github.com/vdaas/vald/tree/master/charts/vald/values). +Please try to reference. + +### General + +Regardless of the backup destination, the following Vald Agent settings must be set to enable backup. +`in-memory-mode=false` means storing index files in the local volume. +`index_path` is the location where those files are stored. + +### Persistent Volume + +#### requirement + +You must prepare PV before deployment when using Kubernetes Persistent Volume (PV) for backup storage. +Please refer to the setup guide of the usage environment for the provisioning PV. + +For example: + - [GKE setup PV document](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes) + - [EKS storage document](https://docs.aws.amazon.com/eks/latest/userguide/storage.html) + +#### configuration + +After provisioning PV, the following parameters are needed to be set. + + + +Each PV will be mounted on each Vald Agent Pod's `index_path`. + +It is highly recommended to set `copy_on_write` (CoW) as `true`. + +The CoW is an option to update the backup file safely. +The backup file may be corrupted, and the Vald Agent pod may not restore from backup files when the Vald Agent pod terminates during saveIndex without CoW is not be enabled. +On the other hand, when CoW is enabled, the Vald Agent pod can restore the data from one generation ago. + +### S3 + +#### requirement + +Before deployment, you must provision the S3 object storage. +You can use any S3-compatible object storage. + - [AWS S3](https://aws.amazon.com/s3/) + - [Google Cloud Storage](https://cloud.google.com/storage/docs/) + +#### configuration + +After provisioning the object storage, the following parameters are needed to be set. +To enable the backup function with S3, the Vald Agent Sidecar should be enabled. + + + +The Vald Agent Sidecar needs an access key and a secret access key to communicate with your object storage. +Before applying the helm chart, register each value in Kubernetes secrets with the following commands. + + + +### Persistent Volume and S3 + +You can use both PV and S3 at the same time. + + + +## Restore + +Restoring from the backup file runs on initContainer when the config is set correctly and the backup file exists. + +If you use both PV and S3, the backup file used for restoration will prioritize the file on PV. +If the backup file does not exist on the PV, the backup file will be retrieved from S3 via the Vald Agent Sidecar and restored. From a68e0dbd56d8d3d30aad353be89d56d9bbfb43f3 Mon Sep 17 00:00:00 2001 From: vankichi Date: Mon, 1 Aug 2022 15:09:58 +0900 Subject: [PATCH 2/8] :pencil: add sample yaml Signed-off-by: vankichi --- docs/user-guides/backup-configuration.md | 132 ++++++++++++++++++++--- 1 file changed, 120 insertions(+), 12 deletions(-) diff --git a/docs/user-guides/backup-configuration.md b/docs/user-guides/backup-configuration.md index 2d6e6fbff1..16e7eee3eb 100644 --- a/docs/user-guides/backup-configuration.md +++ b/docs/user-guides/backup-configuration.md @@ -19,8 +19,22 @@ Please try to reference. ### General Regardless of the backup destination, the following Vald Agent settings must be set to enable backup. -`in-memory-mode=false` means storing index files in the local volume. -`index_path` is the location where those files are stored. +- `index_path` is the location where those files are stored. +- `in-memory-mode=false` means storing index files in the local volume. + +In addition, `agent.terminationGracePeriodSeconds` value should be long enough to ensure the backup speed. + +```yaml +agent: + ... + # We recommend setting this value long enough to ensure the backup speed of PV, since the Index is backed up at the end of the pod. + terminationGracePeriodSeconds: 3600 + ngt: + ... + index_path: "/var/ngt/index" + enable_in_memory_mode: false + ... +``` ### Persistent Volume @@ -30,22 +44,37 @@ You must prepare PV before deployment when using Kubernetes Persistent Volume (P Please refer to the setup guide of the usage environment for the provisioning PV. For example: - - [GKE setup PV document](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes) - - [EKS storage document](https://docs.aws.amazon.com/eks/latest/userguide/storage.html) +- [GKE setup PV document](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes) +- [EKS storage document](https://docs.aws.amazon.com/eks/latest/userguide/storage.html) #### configuration After provisioning PV, the following parameters are needed to be set. - +```yaml +agent: + ... + persistentVolume: + # use PV flag + enabled: true + # accessMode for PV (please verify your environment) + accessMode: ReadWriteOncePod + # storage class for PV (please verify your environment) + storageClass: local-path + # set enough size for backup + size: 2Gi + ... +``` Each PV will be mounted on each Vald Agent Pod's `index_path`. It is highly recommended to set `copy_on_write` (CoW) as `true`. -The CoW is an option to update the backup file safely. -The backup file may be corrupted, and the Vald Agent pod may not restore from backup files when the Vald Agent pod terminates during saveIndex without CoW is not be enabled. +
+The CoW is an option to update the backup file safely.
+The backup file may be corrupted, and the Vald Agent pod may not restore from backup files when the Vald Agent pod terminates during saveIndex without CoW is not be enabled.
On the other hand, when CoW is enabled, the Vald Agent pod can restore the data from one generation ago. +
### S3 @@ -53,26 +82,105 @@ On the other hand, when CoW is enabled, the Vald Agent pod can restore the data Before deployment, you must provision the S3 object storage. You can use any S3-compatible object storage. - - [AWS S3](https://aws.amazon.com/s3/) - - [Google Cloud Storage](https://cloud.google.com/storage/docs/) + +For example: +- [AWS S3](https://aws.amazon.com/s3/) +- [Google Cloud Storage](https://cloud.google.com/storage/docs/) #### configuration After provisioning the object storage, the following parameters are needed to be set. To enable the backup function with S3, the Vald Agent Sidecar should be enabled. - +```yaml +agent: + ... + sidecar: + enabled: true + initContainerEnabled: true + # This is the Amazon S3 settings. + # Please change it according to your environment. + config: + blob_storage: + # storage type (default: s3) + storage_type: "s3" + # your bucket name + bucket: "vald" + s3: + region: "us-central1" + # If you enable sidecar, the following environment variables will be created automatically by default values. + # So, please create the 'aws-secret' resource before deploying. + # env: + # - name: AWS_ACCESS_KEY + # valueFrom: + # secretKeyRef: + # name: aws-secret + # key: access-key + # - name: AWS_SECRET_ACCESS_KEY + # valueFrom: + # secretKeyRef: + # name: aws-secret + # key: secret-access-key +``` The Vald Agent Sidecar needs an access key and a secret access key to communicate with your object storage. Before applying the helm chart, register each value in Kubernetes secrets with the following commands. - +```bash +kubectl create secret -n aws-secret --access-key= --secret-access-key= +``` ### Persistent Volume and S3 You can use both PV and S3 at the same time. - +```yaml +agent: + minReplicas: 9 + maxReplicas: 9 + podManagementPolicy: Parallel + resources: + requests: + cpu: 100m + memory: 50Mi + terminationGracePeriodSeconds: 3600 + persistentVolume: + enabled: true + accessMode: ReadWriteOncePod + storageClass: local-path + size: 2Gi + ngt: + dimension: 784 + index_path: "/var/ngt/index" + enable_in_memory_mode: false + auto_index_duration_limit: 730h + auto_index_check_duration: 24h + auto_index_length: 1000 + auto_save_index_duration: 365h + auto_create_index_pool_size: 1000 + sidecar: + enabled: true + initContainerEnabled: true + config: + blob_storage: + storage_type: "s3" + bucket: "vald" + s3: + region: "us-central1" + # If you enable sidecar, the following environment variables will be created automatically by default values. + # So, please create the 'aws-secret' resource before deploying. + # env: + # - name: AWS_ACCESS_KEY + # valueFrom: + # secretKeyRef: + # name: aws-secret + # key: access-key + # - name: AWS_SECRET_ACCESS_KEY + # valueFrom: + # secretKeyRef: + # name: aws-secret + # key: secret-access-key +``` ## Restore From de195807168ad43d854aa6b9605f5c72793ae0cc Mon Sep 17 00:00:00 2001 From: Kiichiro YUKAWA Date: Tue, 2 Aug 2022 17:03:08 +0900 Subject: [PATCH 3/8] Apply suggestions from code review Co-authored-by: Kevin Diu --- docs/user-guides/backup-configuration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/user-guides/backup-configuration.md b/docs/user-guides/backup-configuration.md index 16e7eee3eb..76a6fe27d0 100644 --- a/docs/user-guides/backup-configuration.md +++ b/docs/user-guides/backup-configuration.md @@ -6,7 +6,7 @@ This page describes how enabled backup features for your Vald cluster. ## What is the backup -Vald's backup function is to save the index information indexed in each Vald Agent pod as file data in Persistent Volume or S3. +Vald's backup function is to save the index data in each Vald Agent pod as a data file to the Persistent Volume or S3. When the Vald Agent pod is restarted for some reason, the index state is restored from the saved index data. ## Backup configuration @@ -19,7 +19,7 @@ Please try to reference. ### General Regardless of the backup destination, the following Vald Agent settings must be set to enable backup. -- `index_path` is the location where those files are stored. +- `index_path` is the backup file location. - `in-memory-mode=false` means storing index files in the local volume. In addition, `agent.terminationGracePeriodSeconds` value should be long enough to ensure the backup speed. From 4ac22f0bad0726be0fbe19cc83c731f7c51fbf05 Mon Sep 17 00:00:00 2001 From: Kiichiro YUKAWA Date: Tue, 2 Aug 2022 20:16:19 +0900 Subject: [PATCH 4/8] Apply suggestions from code review Co-authored-by: Kevin Diu --- docs/user-guides/backup-configuration.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/user-guides/backup-configuration.md b/docs/user-guides/backup-configuration.md index 76a6fe27d0..4b76f7e02c 100644 --- a/docs/user-guides/backup-configuration.md +++ b/docs/user-guides/backup-configuration.md @@ -2,7 +2,7 @@ There are three types of options for the Vald cluster: backup, filtering, and core algorithms. -This page describes how enabled backup features for your Vald cluster. +This page describes how to enable the backup feature on your Vald cluster. ## What is the backup @@ -11,10 +11,10 @@ When the Vald Agent pod is restarted for some reason, the index state is restore ## Backup configuration -This section shows the best practice backup configuration with PV, S3, or PV + S3. +This section shows the best practice for configuring backup features with PV, S3, or PV + S3. Each sample configuration yaml is published on [here](https://github.com/vdaas/vald/tree/master/charts/vald/values). -Please try to reference. +Please refer it for more details. ### General @@ -72,7 +72,7 @@ It is highly recommended to set `copy_on_write` (CoW) as `true`.
The CoW is an option to update the backup file safely.
-The backup file may be corrupted, and the Vald Agent pod may not restore from backup files when the Vald Agent pod terminates during saveIndex without CoW is not be enabled.
+The backup file may be corrupted, and the Vald Agent pod may not be able to restore from backup files when the Vald Agent pod terminates during the save index function without CoW is not be enabled.
On the other hand, when CoW is enabled, the Vald Agent pod can restore the data from one generation ago.
From f3e846928fc8d2c43d11375962b6ca99fb2cb62e Mon Sep 17 00:00:00 2001 From: vankichi Date: Tue, 23 Aug 2022 10:22:20 +0900 Subject: [PATCH 5/8] :pencil: apply feedback Signed-off-by: vankichi --- docs/user-guides/backup-configuration.md | 104 ++++++++++++++++++----- 1 file changed, 81 insertions(+), 23 deletions(-) diff --git a/docs/user-guides/backup-configuration.md b/docs/user-guides/backup-configuration.md index 4b76f7e02c..d983b6530f 100644 --- a/docs/user-guides/backup-configuration.md +++ b/docs/user-guides/backup-configuration.md @@ -9,6 +9,22 @@ This page describes how to enable the backup feature on your Vald cluster. Vald's backup function is to save the index data in each Vald Agent pod as a data file to the Persistent Volume or S3. When the Vald Agent pod is restarted for some reason, the index state is restored from the saved index data. +## Backup methods + +You can choose one of three types of backup methods. + +- PV (recommended) +- S3 +- PV + S3 + +Please refer to the following tables and decide which method fit for your case. + +| | PV | S3 | PV+S3 | +| :---------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------- | +| usecase | Want to use backup with low cost
Would not like to use some external storage for backup
Want to backup with highly compatible storage with Kubernetes | Want to use the same backup file with several Vald clusters
Want to access the backup files easily | Want to use backup with PV basically and access the backup files easily
Want to prevent backup file failure due to Kubernetes cluster failure | +| merit :+1: | Easy to use
Highly compatible with Kubernetes
Complete with internal network
Safety backup if applying CoW | Easy to access backup files | It fan be shared and used by multiple clusters | The safest of these methods | +| demerit :x: | A bit hard to check backup files
Can not share backup files for several Vald clusters | Need to communicate with external network | Need to operate both storages
The most expensive way | + ## Backup configuration This section shows the best practice for configuring backup features with PV, S3, or PV + S3. @@ -19,8 +35,9 @@ Please refer it for more details. ### General Regardless of the backup destination, the following Vald Agent settings must be set to enable backup. -- `index_path` is the backup file location. -- `in-memory-mode=false` means storing index files in the local volume. + +- `agent.ngt.index_path` is the location where those files are stored. +- `agent.ngt.in-memory-mode=false` means storing index files in the local volume. In addition, `agent.terminationGracePeriodSeconds` value should be long enough to ensure the backup speed. @@ -44,6 +61,7 @@ You must prepare PV before deployment when using Kubernetes Persistent Volume (P Please refer to the setup guide of the usage environment for the provisioning PV. For example: + - [GKE setup PV document](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes) - [EKS storage document](https://docs.aws.amazon.com/eks/latest/userguide/storage.html) @@ -64,16 +82,29 @@ agent: # set enough size for backup size: 2Gi ... + terminationGracePeriodSeconds: 3600 + ... + ngt: + ... + index_path: "/var/ngt/index" + enable_in_memory_mode: false + # copy on write function flag + enable_copy_on_write: true + ... ``` Each PV will be mounted on each Vald Agent Pod's `index_path`. -It is highly recommended to set `copy_on_write` (CoW) as `true`. +You can choose `copy_on_write` (CoW) function. + +The CoW is an option to update the backup file safely. -
-The CoW is an option to update the backup file safely.
-The backup file may be corrupted, and the Vald Agent pod may not be able to restore from backup files when the Vald Agent pod terminates during the save index function without CoW is not be enabled.
-On the other hand, when CoW is enabled, the Vald Agent pod can restore the data from one generation ago. +The backup file may be corrupted, and the Vald Agent pod may not restore from backup files when the Vald Agent pod terminates during saveIndex without CoW is not be enabled. +On the other hand, if CoW is enabled, the Vald Agent pod can restore the data from one generation ago. + +
+When CoW is enabled, PV temporarily has two backup files; new and old versions.
+So, A double storage capacity is required if CoW is enabled, e.g., when set 2Gi as size without CoW, the size should be more than 4Gi with CoW.
### S3 @@ -84,6 +115,7 @@ Before deployment, you must provision the S3 object storage. You can use any S3-compatible object storage. For example: + - [AWS S3](https://aws.amazon.com/s3/) - [Google Cloud Storage](https://cloud.google.com/storage/docs/) @@ -95,8 +127,16 @@ To enable the backup function with S3, the Vald Agent Sidecar should be enabled. ```yaml agent: ... + terminationGracePeriodSeconds: 3600 + ... + ngt: + ... + index_path: "/var/ngt/index" + enable_in_memory_mode: false + ... sidecar: enabled: true + # run sidecar with initContainerMode or not initContainerEnabled: true # This is the Amazon S3 settings. # Please change it according to your environment. @@ -133,37 +173,39 @@ kubectl create secret -n aws-secret --access-key= Date: Tue, 23 Aug 2022 10:24:14 +0900 Subject: [PATCH 6/8] :pencil: apply feedback Signed-off-by: vankichi --- docs/user-guides/backup-configuration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/user-guides/backup-configuration.md b/docs/user-guides/backup-configuration.md index d983b6530f..65e166a428 100644 --- a/docs/user-guides/backup-configuration.md +++ b/docs/user-guides/backup-configuration.md @@ -20,9 +20,9 @@ You can choose one of three types of backup methods. Please refer to the following tables and decide which method fit for your case. | | PV | S3 | PV+S3 | -| :---------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------- | +| :---------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------- | | usecase | Want to use backup with low cost
Would not like to use some external storage for backup
Want to backup with highly compatible storage with Kubernetes | Want to use the same backup file with several Vald clusters
Want to access the backup files easily | Want to use backup with PV basically and access the backup files easily
Want to prevent backup file failure due to Kubernetes cluster failure | -| merit :+1: | Easy to use
Highly compatible with Kubernetes
Complete with internal network
Safety backup if applying CoW | Easy to access backup files | It fan be shared and used by multiple clusters | The safest of these methods | +| merit :+1: | Easy to use
Highly compatible with Kubernetes
Complete with internal network
Safety backup if applying CoW | Easy to access backup files
It fan be shared and used by multiple clusters | The safest of these methods | | demerit :x: | A bit hard to check backup files
Can not share backup files for several Vald clusters | Need to communicate with external network | Need to operate both storages
The most expensive way | ## Backup configuration From 4f24577ad9cd55ef0c56621ddd94824fa796b845 Mon Sep 17 00:00:00 2001 From: Kiichiro YUKAWA Date: Thu, 25 Aug 2022 13:28:33 +0900 Subject: [PATCH 7/8] Apply suggestions from code review Co-authored-by: Yusuke Kato --- docs/user-guides/backup-configuration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/user-guides/backup-configuration.md b/docs/user-guides/backup-configuration.md index 65e166a428..6d3cc447a2 100644 --- a/docs/user-guides/backup-configuration.md +++ b/docs/user-guides/backup-configuration.md @@ -22,8 +22,8 @@ Please refer to the following tables and decide which method fit for your case. | | PV | S3 | PV+S3 | | :---------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------- | | usecase | Want to use backup with low cost
Would not like to use some external storage for backup
Want to backup with highly compatible storage with Kubernetes | Want to use the same backup file with several Vald clusters
Want to access the backup files easily | Want to use backup with PV basically and access the backup files easily
Want to prevent backup file failure due to Kubernetes cluster failure | -| merit :+1: | Easy to use
Highly compatible with Kubernetes
Complete with internal network
Safety backup if applying CoW | Easy to access backup files
It fan be shared and used by multiple clusters | The safest of these methods | -| demerit :x: | A bit hard to check backup files
Can not share backup files for several Vald clusters | Need to communicate with external network | Need to operate both storages
The most expensive way | +| pros :+1: | Easy to use
Highly compatible with Kubernetes
Low latency using in-cluster network
Safety backup using Copy on Write option | Easy to access backup files
It can be shared and used by multiple clusters | The safest of these methods | +| cons :-1: | A bit hard to check backup files
Can not share backup files for several Vald clusters | Need to communicate with external network | Need to operate both storages
The most expensive way | ## Backup configuration From 221fb4fa862755badfdcdb7d7f528bad483fdf33 Mon Sep 17 00:00:00 2001 From: vankichi Date: Thu, 25 Aug 2022 23:32:10 +0900 Subject: [PATCH 8/8] :pencil: update sample yaml Signed-off-by: vankichi --- docs/user-guides/backup-configuration.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/docs/user-guides/backup-configuration.md b/docs/user-guides/backup-configuration.md index 6d3cc447a2..3a148c5637 100644 --- a/docs/user-guides/backup-configuration.md +++ b/docs/user-guides/backup-configuration.md @@ -19,10 +19,10 @@ You can choose one of three types of backup methods. Please refer to the following tables and decide which method fit for your case. -| | PV | S3 | PV+S3 | -| :---------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------- | -| usecase | Want to use backup with low cost
Would not like to use some external storage for backup
Want to backup with highly compatible storage with Kubernetes | Want to use the same backup file with several Vald clusters
Want to access the backup files easily | Want to use backup with PV basically and access the backup files easily
Want to prevent backup file failure due to Kubernetes cluster failure | -| pros :+1: | Easy to use
Highly compatible with Kubernetes
Low latency using in-cluster network
Safety backup using Copy on Write option | Easy to access backup files
It can be shared and used by multiple clusters | The safest of these methods | +| | PV | S3 | PV+S3 | +| :-------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------- | +| usecase | Want to use backup with low cost
Would not like to use some external storage for backup
Want to backup with highly compatible storage with Kubernetes | Want to use the same backup file with several Vald clusters
Want to access the backup files easily | Want to use backup with PV basically and access the backup files easily
Want to prevent backup file failure due to Kubernetes cluster failure | +| pros :+1: | Easy to use
Highly compatible with Kubernetes
Low latency using in-cluster network
Safety backup using Copy on Write option | Easy to access backup files
It can be shared and used by multiple clusters | The safest of these methods | | cons :-1: | A bit hard to check backup files
Can not share backup files for several Vald clusters | Need to communicate with external network | Need to operate both storages
The most expensive way | ## Backup configuration @@ -68,6 +68,7 @@ For example: #### configuration After provisioning PV, the following parameters are needed to be set. +It shows the example for using GKE. ```yaml agent: @@ -75,10 +76,10 @@ agent: persistentVolume: # use PV flag enabled: true - # accessMode for PV (please verify your environment) - accessMode: ReadWriteOncePod + # accessMode for PV (please verify your environment). + accessMode: ReadWriteOnce # storage class for PV (please verify your environment) - storageClass: local-path + storageClass: standard # set enough size for backup size: 2Gi ... @@ -184,9 +185,9 @@ agent: # use PV flag enabled: true # accessMode for PV (please verify your environment) - accessMode: ReadWriteOncePod + accessMode: ReadWriteOnce # storage class for PV (please verify your environment) - storageClass: local-path + storageClass: standard # set enough size for backup size: 2Gi ngt: