Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance metrics endpoint #189

Closed
3 tasks done
prashanth26 opened this issue Nov 20, 2018 · 5 comments · Fixed by #195
Closed
3 tasks done

Enhance metrics endpoint #189

prashanth26 opened this issue Nov 20, 2018 · 5 comments · Fixed by #195
Labels
area/monitoring Monitoring (including availability monitoring and alerting) related effort/1d Effort for issue is around 1 day kind/enhancement Enhancement, improvement, extension topology/seed Affects Seed clusters

Comments

@prashanth26
Copy link
Contributor

prashanth26 commented Nov 20, 2018

Issue

The metric endpoints currently only expose the machine objects managed by the machine controller manager. This could be more useful to track MCM freezing and # of API calls

Solution

The metrics endpoint should expose more detailed metrics for MCM.
Like

  • # of machines, machineSet, machineDeployment
  • # of API calls to the cloud provider
  • Freeze status of MCM
@prashanth26 prashanth26 added kind/enhancement Enhancement, improvement, extension size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) area/monitoring Monitoring (including availability monitoring and alerting) related topology/seed Affects Seed clusters labels Nov 20, 2018
@fsniper
Copy link
Contributor

fsniper commented Nov 20, 2018

I was also working on this issue and may be the metrics from the below list can also be considered? I am ready to start on this and try to add some metrics if you have not already started working on this.

* crd:machinedeployment: count, age|created, availableReplicas, readyReplicas, replicas, updatedReplicas
* crd:machineset: count, availableReplicas, failedMachines, fullyLabeledReplicas, readyReplicas, replicas, age|created
* crd:machine: age|created, count, conditions, status

@prashanth26
Copy link
Contributor Author

prashanth26 commented Nov 20, 2018

Hi @fsniper ,

We are happy to have you pitch in. You are more than welcome to take this issue up. The metrics you have mentioned above does make sense to me. In addition to that, if you are able to track the # of API calls to cloud provider (keeping a global atomic integer counter?) and also expose the freeze status of the controller it would be great.

Thanks & Regards,
Prashanth

@fsniper
Copy link
Contributor

fsniper commented Nov 23, 2018

  • # of API calls to the cloud provider
  • # of machines, machineSet, machineDeployment
  • machine info, created (age), conditions, phase
  • machineset info, created (age), status, failed_machines
  • machinedeployment: info, created (age), status, conditions, failed_machines
  • Freeze status of MCM

@gardener-robot-ci-1 gardener-robot-ci-1 added the status/accepted Issue was accepted as something we need to work on label Dec 18, 2018
@gardener-robot-ci-1 gardener-robot-ci-1 removed the status/accepted Issue was accepted as something we need to work on label Jan 6, 2019
@prashanth26
Copy link
Contributor Author

Reopening issue to have a second look at this, and see how they can be helpful while alerting.

@prashanth26
Copy link
Contributor Author

prashanth26 commented Feb 20, 2019

@gardener-robot gardener-robot added effort/1d Effort for issue is around 1 day and removed size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Monitoring (including availability monitoring and alerting) related effort/1d Effort for issue is around 1 day kind/enhancement Enhancement, improvement, extension topology/seed Affects Seed clusters
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants