-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agones controller metrics becomes a huge amount of data over time #2424
Comments
This is absolutely a cardinality bomb for sure! I think the best solution would be to actual remove the label reference to the node name. I'm not sure it actually adds any value to the allocation metrics anyway? |
I am in favor of removing the node name label. |
@markmandel |
That sounds great! - the only caveat I have is that I would like to double check if it breaks anything in the grafana dashboards before closing out this issue (but you don't have to wait on a PR for that -- it can be checked separately). Also, if you find any other cardinality bombs in metrics, please file separate issues for it - a lot of the metrics was written a long time ago, and might need a review. |
Any objection to closing this issue, since we solved it in #2433? |
Yes, there is no problem. |
Is your feature request related to a problem? Please describe.
The Agones controller metrics
agones_gameserver_allocations_duration_seconds
becomes a huge amount of data over time.In
agones_gameserver_allocations_duration_seconds
, there is a node_name label, but in environments such as GKE where the number of nodes increases or decreases dynamically, the cardinality of node_name increases over time.As a result, the
agones_gameserver_allocations_duration_seconds
in the /metrics API response of the Agones controller will have a huge number of rows.e.g.
If 10 new nodes are created every day, there will be more than 12000 rows after 100 days.
(
10[instances]*100[days]*12[distribution] = 12000[rows]
)If this situation continues, sending metrics to GoogleCloud's CloudMonitoring and other services will cost a huge amount of money.
Describe the solution you'd like
The Agoens controller clears the old
agones_gameserver_allocations_duration_seconds
data after a certain amount of time has passed.It would be nice to be able to specify this value in values in helm.
Maybe there is a better way, but I don't have any ideas.
The text was updated successfully, but these errors were encountered: