-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics with, at least, success rate #13
Comments
One open question that I don't have the opportunity to look up right this moment, but may need to be solved, is if the monitoring endpoints can be exposed to a wide enough interface/port to actually be monitorable by Prometheuses |
We should be able to scrape metrics from this agent through APIServer -> kubelet -> pod identity agent. |
Excellent, thank you for the response. We'll keep this ticket updated as we approach this. It's looking like it'll be around Sept-Oct timeframe that we'll be able to pick up the work |
Hi :) |
Hi! We're interested in onboarding Pod Identity for our clusters. As we're planning out our installation, we feel a lack of observability into the agent, which may effect our ability to operate the system at scale. If I'm reading the code right, it appears that the only signals we can get, as consumers of the agent, are largely the
/healthz
and/readyz
endpoints (both of which lead to the same probe).Given the criticality of the system as we onboard it, it would be valuable for us to get one further level of detail. I'm thinking in the best case would be the ability to get success rate per agent running (since, if I understand the code, it seems like it's largely a HTTP service).
One thing we could implement would be a simple Prometheus/OpenMetrics endpoint which could expose just simple 200/300/400/500s (per the default go prometheus client), and that would give us the lion's share of what we need out of the observability story. It could go deeper into other facets, but... baby steps ;). If we had some confidence that the base metrics could be integrated upstream, it's possible we could take on this work to implement it.
Alternatively, these metrics could go to CloudWatch or something, but that's more of a new area for me so don't know what that'd look like.
The text was updated successfully, but these errors were encountered: