From f07ac59622d2bc2503f6d436116e0e623aa6c055 Mon Sep 17 00:00:00 2001 From: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> Date: Wed, 27 Sep 2023 12:42:46 +0200 Subject: [PATCH 1/9] Add description for multi-worker fluentd Signed-off-by: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> --- docs/multi-worker.md | 90 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) create mode 100644 docs/multi-worker.md diff --git a/docs/multi-worker.md b/docs/multi-worker.md new file mode 100644 index 000000000..6b843012f --- /dev/null +++ b/docs/multi-worker.md @@ -0,0 +1,90 @@ +# Mutli-Worker Fluentd Setup + +## Necessity + +In specific scenarios, a fluentd with a single worker instance cannot process and forward the high amount of logs produced on clusters. This can lead to fluentd Pods not accepting additional traffic from fluent-bits and fluent-bits suffering under Backpressure. In the end, both fluentd and fluent-bit Pods might run into their memory limits and get restarted by Kubernetes. Enabling multiple worker processes per fluentd Pod will increase the performance of this component, so it is recommended to use a multi-worker approach in environments with high log volume. Additionally the official [fluentd documentation](https://docs.fluentd.org/deployment/multi-process-workers) might be helpful. + +## Recommended implementation + +When enabling the multi-worker setup, it is recommended to ensure the following things: +- Place the fluentds on separate Nodes +- Increase the compute and memory resources +- Do not use specific filter plugins + - detectExceptions [is not working](https://github.com/kube-logging/logging-operator/issues/1490) +- Also scale [horizontally](https://kube-logging.dev/docs/logging-infrastructure/fluentd/#autoscaling) + +To ensure that the fluentd Pods have enough resources, a common approach is to use specific Nodes for the fluentds and to reserve enough computing and memory resources. A new nodePool should be created with a specific label and a taint. Ideally, the nodeType is compute-optimized. It could look like the following: +```yaml +apiVersion: v1 +kind: Node +metadata: + labels: + type: cpu + name: node1 +spec: + taints: + - effect: NoSchedule + key: type + value: cpu +``` + +The corresponding setting in the Logging-CRD is looking like follows: +```yaml +fluentd: + nodeSelector: + type: cpu + tolerations: + - effect: NoSchedule + key: type + operator: Equal + value: cpu +``` + +Additionaly we will have to increase the resources that are requested by the fluentd Pods. In the default setting they use following requests and limits: +```yaml +- Limits: + - cpu: 1000m + - memory: 400M +- Requests: + - cpu: 500m + - memory: 100M +``` + +In this short walkthrough, we will increase the fluentd workers from `1` to `5`. Therefore, we will multiply the requests and limits with factor 5 to ensure enough resources are reserved. Additionally, we will set requests and limits to the same values to ensure that the fluentd Pods are not affected by other workloads on the Node. This is, in general, a good practice. We do this by changing the Logging-CRD like follows: +```yaml +fluentd: + nodeSelector: + type: cpu + tolerations: + - effect: NoSchedule + key: type + operator: Equal + value: cpu + resources: + limits: + cpu: 5 + memory: 2G + requests: + cpu: 5 + memory: 2G +``` + +Lastly we can increase the number of fluentd-workers that are used per Pod: +```yaml +fluentd: + nodeSelector: + type: cpu + tolerations: + - effect: NoSchedule + key: type + operator: Equal + value: cpu + resources: + limits: + cpu: 5 + memory: 2G + requests: + cpu: 5 + memory: 2G + workers: 5 +``` \ No newline at end of file From 261e404e1aec594ef820229680354a4694cb92cb Mon Sep 17 00:00:00 2001 From: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> Date: Thu, 28 Sep 2023 10:01:09 +0200 Subject: [PATCH 2/9] Add buffers Signed-off-by: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> --- docs/multi-worker.md | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/docs/multi-worker.md b/docs/multi-worker.md index 6b843012f..ee4d0d519 100644 --- a/docs/multi-worker.md +++ b/docs/multi-worker.md @@ -9,6 +9,7 @@ In specific scenarios, a fluentd with a single worker instance cannot process an When enabling the multi-worker setup, it is recommended to ensure the following things: - Place the fluentds on separate Nodes - Increase the compute and memory resources +- Configure a buffer volume for fluentd - Do not use specific filter plugins - detectExceptions [is not working](https://github.com/kube-logging/logging-operator/issues/1490) - Also scale [horizontally](https://kube-logging.dev/docs/logging-infrastructure/fluentd/#autoscaling) @@ -69,6 +70,37 @@ fluentd: memory: 2G ``` +The fluentd-Pods should receive their input and buffer them on their filesystem. After that the workers can pick the logs up, process and forward them to their final destination. For this we will have to configure a PVC and as a buffer volume: + +```yaml +fluentd: + nodeSelector: + type: cpu + tolerations: + - effect: NoSchedule + key: type + operator: Equal + value: cpu + resources: + limits: + cpu: 5 + memory: 2G + requests: + cpu: 5 + memory: 2G + rootDir: /buffers + bufferStorageVolume: + pvc: + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 40Gi + storageClassName: default + volumeMode: Filesystem +``` + Lastly we can increase the number of fluentd-workers that are used per Pod: ```yaml fluentd: @@ -86,5 +118,16 @@ fluentd: requests: cpu: 5 memory: 2G + rootDir: /buffers + bufferStorageVolume: + pvc: + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 40Gi + storageClassName: default + volumeMode: Filesystem workers: 5 ``` \ No newline at end of file From 7476d0dc7754c7e8645b500b80abcb61cabe63de Mon Sep 17 00:00:00 2001 From: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> Date: Thu, 28 Sep 2023 10:18:41 +0200 Subject: [PATCH 3/9] Add buffers Signed-off-by: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> --- docs/multi-worker.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/multi-worker.md b/docs/multi-worker.md index ee4d0d519..7e0011d1c 100644 --- a/docs/multi-worker.md +++ b/docs/multi-worker.md @@ -70,7 +70,7 @@ fluentd: memory: 2G ``` -The fluentd-Pods should receive their input and buffer them on their filesystem. After that the workers can pick the logs up, process and forward them to their final destination. For this we will have to configure a PVC and as a buffer volume: +The fluentd-Pods should receive their input and buffer them on their filesystem. After that, the workers can pick the logs up, process and forward them to their final destination. For this, we will have to configure a PVC and a buffer volume: ```yaml fluentd: From d0d750c1178a97dc104fc84d8382520b51e74779 Mon Sep 17 00:00:00 2001 From: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> Date: Fri, 29 Sep 2023 09:17:22 +0200 Subject: [PATCH 4/9] Update docs/multi-worker.md Co-authored-by: Peter Wilcsinszky Signed-off-by: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> --- docs/multi-worker.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/multi-worker.md b/docs/multi-worker.md index 7e0011d1c..f7b387e00 100644 --- a/docs/multi-worker.md +++ b/docs/multi-worker.md @@ -51,7 +51,7 @@ Additionaly we will have to increase the resources that are requested by the flu - memory: 100M ``` -In this short walkthrough, we will increase the fluentd workers from `1` to `5`. Therefore, we will multiply the requests and limits with factor 5 to ensure enough resources are reserved. Additionally, we will set requests and limits to the same values to ensure that the fluentd Pods are not affected by other workloads on the Node. This is, in general, a good practice. We do this by changing the Logging-CRD like follows: +In this short walkthrough, we will increase the fluentd workers from `1` to `5`. Therefore, we will multiply the requests and limits with factor 5 to ensure enough resources are reserved. Additionally, we will set requests and limits to the same values to ensure that the fluentd Pods are not affected by other workloads on the Node. This is, in general, a good practice. We do this by changing the Logging resource as follows: ```yaml fluentd: nodeSelector: From 378baefed390592ca8669c72ca365d669eccd689 Mon Sep 17 00:00:00 2001 From: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> Date: Fri, 29 Sep 2023 09:17:29 +0200 Subject: [PATCH 5/9] Update docs/multi-worker.md Co-authored-by: Peter Wilcsinszky Signed-off-by: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> --- docs/multi-worker.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/multi-worker.md b/docs/multi-worker.md index f7b387e00..b30da5c40 100644 --- a/docs/multi-worker.md +++ b/docs/multi-worker.md @@ -29,7 +29,7 @@ spec: value: cpu ``` -The corresponding setting in the Logging-CRD is looking like follows: +The corresponding setting in the Logging resource looks like follows: ```yaml fluentd: nodeSelector: From 70683878b69ed06721c62ffe012a0c6ce35bb39d Mon Sep 17 00:00:00 2001 From: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> Date: Fri, 29 Sep 2023 13:51:25 +0200 Subject: [PATCH 6/9] Move rootDir to worker step Signed-off-by: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> --- docs/multi-worker.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/multi-worker.md b/docs/multi-worker.md index b30da5c40..24ddbfcb7 100644 --- a/docs/multi-worker.md +++ b/docs/multi-worker.md @@ -88,7 +88,6 @@ fluentd: requests: cpu: 5 memory: 2G - rootDir: /buffers bufferStorageVolume: pvc: spec: @@ -101,7 +100,7 @@ fluentd: volumeMode: Filesystem ``` -Lastly we can increase the number of fluentd-workers that are used per Pod: +Lastly we can increase the number of fluentd-workers that are used per Pod and set the rootDir field. It is important that those two settings are changed together otherwise the fluentd process will not work correctly: ```yaml fluentd: nodeSelector: @@ -118,7 +117,6 @@ fluentd: requests: cpu: 5 memory: 2G - rootDir: /buffers bufferStorageVolume: pvc: spec: @@ -130,4 +128,5 @@ fluentd: storageClassName: default volumeMode: Filesystem workers: 5 + rootDir: /buffers ``` \ No newline at end of file From 99df567b3468666d50b3b1cfda1add61f57fda67 Mon Sep 17 00:00:00 2001 From: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> Date: Fri, 29 Sep 2023 13:54:48 +0200 Subject: [PATCH 7/9] Put full config to the end Signed-off-by: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> --- docs/multi-worker.md | 84 +++++++++++++++++--------------------------- 1 file changed, 33 insertions(+), 51 deletions(-) diff --git a/docs/multi-worker.md b/docs/multi-worker.md index 24ddbfcb7..171a06404 100644 --- a/docs/multi-worker.md +++ b/docs/multi-worker.md @@ -29,16 +29,15 @@ spec: value: cpu ``` -The corresponding setting in the Logging resource looks like follows: +The corresponding setting in the FluentdSpec looks like follows: ```yaml -fluentd: - nodeSelector: - type: cpu - tolerations: - - effect: NoSchedule - key: type - operator: Equal - value: cpu +nodeSelector: + type: cpu +tolerations: +- effect: NoSchedule + key: type + operator: Equal + value: cpu ``` Additionaly we will have to increase the resources that are requested by the fluentd Pods. In the default setting they use following requests and limits: @@ -51,57 +50,40 @@ Additionaly we will have to increase the resources that are requested by the flu - memory: 100M ``` -In this short walkthrough, we will increase the fluentd workers from `1` to `5`. Therefore, we will multiply the requests and limits with factor 5 to ensure enough resources are reserved. Additionally, we will set requests and limits to the same values to ensure that the fluentd Pods are not affected by other workloads on the Node. This is, in general, a good practice. We do this by changing the Logging resource as follows: +In this short walkthrough, we will increase the fluentd workers from `1` to `5`. Therefore, we will multiply the requests and limits with factor 5 to ensure enough resources are reserved. Additionally, we will set requests and limits to the same values to ensure that the fluentd Pods are not affected by other workloads on the Node. This is, in general, a good practice. It is necessary to set the following settings in the FluentdSpec: ```yaml -fluentd: - nodeSelector: - type: cpu - tolerations: - - effect: NoSchedule - key: type - operator: Equal - value: cpu - resources: - limits: - cpu: 5 - memory: 2G - requests: - cpu: 5 - memory: 2G +resources: + limits: + cpu: 5 + memory: 2G + requests: + cpu: 5 + memory: 2G ``` -The fluentd-Pods should receive their input and buffer them on their filesystem. After that, the workers can pick the logs up, process and forward them to their final destination. For this, we will have to configure a PVC and a buffer volume: +The fluentd-Pods should receive their input and buffer them on their filesystem. After that, the workers can pick the logs up, process and forward them to their final destination. For this, we will have to configure a PVC and a buffer volume in the FluentdSpec: ```yaml -fluentd: - nodeSelector: - type: cpu - tolerations: - - effect: NoSchedule - key: type - operator: Equal - value: cpu - resources: - limits: - cpu: 5 - memory: 2G - requests: - cpu: 5 - memory: 2G - bufferStorageVolume: - pvc: - spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 40Gi - storageClassName: default - volumeMode: Filesystem +bufferStorageVolume: + pvc: + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 40Gi + storageClassName: default + volumeMode: Filesystem ``` Lastly we can increase the number of fluentd-workers that are used per Pod and set the rootDir field. It is important that those two settings are changed together otherwise the fluentd process will not work correctly: ```yaml +workers: 5 +rootDir: /buffers +``` + +The full configuration of the Logging resource looks like follows: +```yaml fluentd: nodeSelector: type: cpu From 993a5a1f1404ac086fcd66b787b1e2addb6286e8 Mon Sep 17 00:00:00 2001 From: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> Date: Fri, 29 Sep 2023 13:55:54 +0200 Subject: [PATCH 8/9] Make resources consistent Signed-off-by: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> --- docs/multi-worker.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/multi-worker.md b/docs/multi-worker.md index 171a06404..988a825fa 100644 --- a/docs/multi-worker.md +++ b/docs/multi-worker.md @@ -42,12 +42,13 @@ tolerations: Additionaly we will have to increase the resources that are requested by the fluentd Pods. In the default setting they use following requests and limits: ```yaml -- Limits: - - cpu: 1000m - - memory: 400M -- Requests: - - cpu: 500m - - memory: 100M +resources: + limits: + cpu: 1 + memory: 400M + requests: + cpu: 500m + memory: 100M ``` In this short walkthrough, we will increase the fluentd workers from `1` to `5`. Therefore, we will multiply the requests and limits with factor 5 to ensure enough resources are reserved. Additionally, we will set requests and limits to the same values to ensure that the fluentd Pods are not affected by other workloads on the Node. This is, in general, a good practice. It is necessary to set the following settings in the FluentdSpec: From 5c7cbe2489d2431af9e3377561ea1345b2eb9926 Mon Sep 17 00:00:00 2001 From: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> Date: Fri, 29 Sep 2023 13:58:10 +0200 Subject: [PATCH 9/9] Make resources consistent Signed-off-by: Florian Stoeber <25154136+florianstoeber@users.noreply.github.com> --- docs/multi-worker.md | 26 -------------------------- 1 file changed, 26 deletions(-) diff --git a/docs/multi-worker.md b/docs/multi-worker.md index 988a825fa..1df921366 100644 --- a/docs/multi-worker.md +++ b/docs/multi-worker.md @@ -9,7 +9,6 @@ In specific scenarios, a fluentd with a single worker instance cannot process an When enabling the multi-worker setup, it is recommended to ensure the following things: - Place the fluentds on separate Nodes - Increase the compute and memory resources -- Configure a buffer volume for fluentd - Do not use specific filter plugins - detectExceptions [is not working](https://github.com/kube-logging/logging-operator/issues/1490) - Also scale [horizontally](https://kube-logging.dev/docs/logging-infrastructure/fluentd/#autoscaling) @@ -62,21 +61,6 @@ resources: memory: 2G ``` -The fluentd-Pods should receive their input and buffer them on their filesystem. After that, the workers can pick the logs up, process and forward them to their final destination. For this, we will have to configure a PVC and a buffer volume in the FluentdSpec: - -```yaml -bufferStorageVolume: - pvc: - spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 40Gi - storageClassName: default - volumeMode: Filesystem -``` - Lastly we can increase the number of fluentd-workers that are used per Pod and set the rootDir field. It is important that those two settings are changed together otherwise the fluentd process will not work correctly: ```yaml workers: 5 @@ -100,16 +84,6 @@ fluentd: requests: cpu: 5 memory: 2G - bufferStorageVolume: - pvc: - spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 40Gi - storageClassName: default - volumeMode: Filesystem workers: 5 rootDir: /buffers ``` \ No newline at end of file