You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just slowly incremented prometheus-server to 20 GB of memory requests for pangeo-hubs cluster. It appears that it wasn't sufficient with 18 GB, because it peaked at close to 19 GB before it fell down to ~3-4GB when Head GC completed was logged ~5 minutes after startup.
This prometheus-server had a /data folder mounted from the attached PVC that was 5.8 GB.
kubectl exec -n support deploy/support-prometheus-server -c prometheus-server -- du -sh /data
5.8G /data
The problem we have is that "write-ahead log" (WAL) is being read during startup from the disk to summarize all metrics collected as I understand it, and that takes a lot of memory. Actually, the problem is that we can't know what this memory requirement is, because it grows over time as more metrics are collected.
Ideas
We upper-bound the WAL size on disk instead of collected metrics age
@yuvipanda i suspect a basic relation between WAL and memory during startup, where WAL would depend on amount of metrics collected i assume. Amount of metrics would be coupled to whats being scraped. Amount of data scraped relates to more endponts scraped, such as one node-exporter per node, such as one per dask worker node.
I think the approach of limiting the amount of metrics consumed is relevant, but I'll go for a close on this issue now, the other ideas was explored a bit.
I just slowly incremented prometheus-server to 20 GB of memory requests for
pangeo-hubs
cluster. It appears that it wasn't sufficient with 18 GB, because it peaked at close to 19 GB before it fell down to ~3-4GB whenHead GC completed
was logged ~5 minutes after startup.This prometheus-server had a
/data
folder mounted from the attached PVC that was 5.8 GB.kubectl exec -n support deploy/support-prometheus-server -c prometheus-server -- du -sh /data 5.8G /data
The problem we have is that "write-ahead log" (WAL) is being read during startup from the disk to summarize all metrics collected as I understand it, and that takes a lot of memory. Actually, the problem is that we can't know what this memory requirement is, because it grows over time as more metrics are collected.
Ideas
Example on logs from a successfull startup
Related
The text was updated successfully, but these errors were encountered: