kube-prometheus-stack - Retention problems #4869

brancomrt · 2024-09-20T20:50:33Z

Describe the bug a clear and concise description of what the bug is.

I am experiencing issues with the configuration of retention policies in the kube-prometheus-stack when installed via Helm chart version 61.7.1.

I set the parameter prometheus.prometheusSpec.retention to a value of 10m or 1h for testing data rotation purposes, but the storage PVC keeps growing and does not clean up the data.

What's your helm version?

version.BuildInfo{Version:"v3.14.4", GitCommit:"81c902a123462fd4052bc5e9aa9c513c4c8fc142", GitTreeState:"clean", GoVersion:"go1.21.9"}

What's your kubectl version?

Client Version: v1.27.10 Kustomize Version: v5.0.1 Server Version: v1.28.12+rke2r1

Which chart?

kube-prometheus-stack

What's the chart version?

61.7.1

What happened?

I am experiencing issues with the configuration of retention policies in the kube-prometheus-stack when installed via Helm chart version 61.7.1.

I set the parameter prometheus.prometheusSpec.retention to a value of 10m or 1h for testing data rotation purposes, but the storage PVC keeps growing and does not clean up the data.

What you expected to happen?

Automatic cleanup of Prometheus storage data on the PVC

How to reproduce it?

Waiting for the retention period defined in the values.yaml and checking the storage size of the PVC prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 to see if it decreases.

Enter the changed values of values.yaml?

prometheus.prometheusSpec.retention

Enter the command that you execute and failing/misfunctioning.

helm upgrade kube-prometheus-stack -n monitoring ./

Local values.yaml chart.

Anything else we need to know?

No response

brancomrt · 2024-09-22T19:12:36Z

I am using a storage class that stores data on NFS.

storageSpec:
volumeClaimTemplate:
spec:
storageClassName: "nfs-client"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi

kubectl get storageclasses.storage.k8s.io

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 131d

chanakya-svt · 2024-09-29T17:16:51Z

@brancomrt I am also facing the same issue with the retention. I set my retention to 15m but the metrics are cleared and the wal size keeps increasing consuming my disk to the point that I am missing metrics because of no space on device.

Were you able to resolve this?

TIA

Below are my args in the statefulset passed to prometheus v2.54.1

--web.console.templates=/etc/prometheus/consoles    
--web.console.libraries=/etc/prometheus/console_libraries 
--config.file=/etc/prometheus/config_out/prometheus.env.yaml                       
--web.enable-lifecycle                                     
--web.external-url=https://redacted.com/prometheus-metrics
--web.route-prefix=/prometheus-metrics                                                                
--log.level=debug                                                              
--storage.tsdb.retention.time=15m
--storage.tsdb.path=/prometheus
--storage.tsdb.wal-compression
--web.config.file=/etc/prometheus/web_config/web-config.yaml

chanakya-svt · 2024-09-30T13:36:50Z

It was mentioned here in a comment that its resolved in v2.21 but I am using v2.54 and issue still persists.

DrFaust92 · 2024-10-04T00:20:39Z

I cant find exact ref to this but because default block size is compacted every 2 hrs you cannot set retention to below that value without changing serveral other parameters as well.

regardless, this is a ticket is relevant for upstream prom/operator and not the chart repo

brancomrt · 2024-10-04T12:56:59Z

Thank you @DrFaust92

rouke-broersma · 2024-10-07T08:14:56Z

This should be closed because it is not a bug but rather a limit of default prometheus configuration.

chanakya-svt · 2024-10-07T14:31:58Z

With the following args configuration, I am seeing the the max-block-duration is set to 6m and min-block-duration is set to 2h(see the attached screenshot). The durations looks backwards, and the retentions are not happening and the wal keeps growing.

But when I pass storage.tsdb.min-block-duration set to 1h and storage.tsdb.max-block-duration set to 2h as additional args, I see the wal is compacted every 1h or when it reaches256MB size. (in my case its size limit)

I am not sure if the chart is defaulting the values or its a upstream prometheus issue.

--web.console.templates=/etc/prometheus/consoles    
--web.console.libraries=/etc/prometheus/console_libraries 
--config.file=/etc/prometheus/config_out/prometheus.env.yaml                       
--web.enable-lifecycle                                     
--web.external-url=https://redacted.com/prometheus-metrics
--web.route-prefix=/prometheus-metrics                                                                
--log.level=info                                                              
--storage.tsdb.retention.time=1h
--storage.tsdb.retention.size=256MB
--storage.tsdb.path=/prometheus
--storage.tsdb.wal-compression
--web.config.file=/etc/prometheus/web_config/web-config.yaml

rouke-broersma · 2024-10-07T15:01:20Z

@chanakya-svt a minimum block duration that is longer than the maximum block duration doesn't make sense.

chanakya-svt · 2024-10-07T17:41:02Z

@rouke-broersma I tried to look into the charts to see if the chart is passing any args thats causing this, but I couldn't pinpoint to anything. Can you confirm if this is upstream prometheus issue? if so, I can create an issue in the prometheus repo. thank you.

mehrdadpfg · 2024-10-08T23:45:33Z

we have the same issue with 2.51

brancomrt added the bug Something isn't working label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-prometheus-stack - Retention problems #4869

kube-prometheus-stack - Retention problems #4869

brancomrt commented Sep 20, 2024

brancomrt commented Sep 22, 2024

chanakya-svt commented Sep 29, 2024

chanakya-svt commented Sep 30, 2024

DrFaust92 commented Oct 4, 2024

brancomrt commented Oct 4, 2024

rouke-broersma commented Oct 7, 2024

chanakya-svt commented Oct 7, 2024

rouke-broersma commented Oct 7, 2024

chanakya-svt commented Oct 7, 2024

mehrdadpfg commented Oct 8, 2024

kube-prometheus-stack - Retention problems #4869

kube-prometheus-stack - Retention problems #4869

Comments

brancomrt commented Sep 20, 2024

Describe the bug a clear and concise description of what the bug is.

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?

brancomrt commented Sep 22, 2024

kubectl get storageclasses.storage.k8s.io

chanakya-svt commented Sep 29, 2024

chanakya-svt commented Sep 30, 2024

DrFaust92 commented Oct 4, 2024

brancomrt commented Oct 4, 2024

rouke-broersma commented Oct 7, 2024

chanakya-svt commented Oct 7, 2024

rouke-broersma commented Oct 7, 2024

chanakya-svt commented Oct 7, 2024

mehrdadpfg commented Oct 8, 2024