You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
The self-monitor is running on small clusters only with the development release. To assure proper scaling and resource usage, a stress test on a large scale cluster is needed. That test should be repeatable so that new features can be re-tested.
Criterias
Verify the self-monitor on a large node/gateway setup
Self-monitor receiving ~10MB data by each scrape loop, after a while this can drive OOM when GC does not collect on right time.
Test manually with GOMEMLIMIT configuration to force GC after reaching a certain limit (here maybe 80% of memory limit)
After setting GOMELIMIT env variable to %80 of configured memory limit of 90Mi deployment we got a more stable pod but this configuration alone was not sufficient, further pprof analysis after setting GOMEMLIMIT shows ~66Mb resident memory for the test instance.
Conclusion
Update white-listed metrics to reduce local DB size and resident memory size (done)
Update memory limit accordingly with new settings (configured 90Mi is too low)
Test and tune configuration on a large cluster (cluster with >100 nodes)
Release Notes
The text was updated successfully, but these errors were encountered:
Description
The self-monitor is running on small clusters only with the development release. To assure proper scaling and resource usage, a stress test on a large scale cluster is needed. That test should be repeatable so that new features can be re-tested.
Criterias
The ephemeral storage has a limit which will not exceed even under high load
Reasons
Attachments
Self-monitor receiving ~10MB data by each scrape loop, after a while this can drive OOM when GC does not collect on right time.
Test manually with GOMEMLIMIT configuration to force GC after reaching a certain limit (here maybe 80% of memory limit)
After setting GOMELIMIT env variable to %80 of configured memory limit of 90Mi deployment we got a more stable pod but this configuration alone was not sufficient, further pprof analysis after setting GOMEMLIMIT shows ~66Mb resident memory for the test instance.
Conclusion
Release Notes
The text was updated successfully, but these errors were encountered: