Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need metrics to know last successful backup by schedule #5398

Closed
Ahmad-Faizan opened this issue Sep 26, 2022 · 1 comment
Closed

Need metrics to know last successful backup by schedule #5398

Ahmad-Faizan opened this issue Sep 26, 2022 · 1 comment
Assignees
Labels
kind/requirement Metrics Related to prometheus metrics
Milestone

Comments

@Ahmad-Faizan
Copy link
Contributor

Ahmad-Faizan commented Sep 26, 2022

Describe the problem/challenge you have
We are trying to build an alert that fires per schedule if the last backup failed. The current metric velero_backup_last_successful_timestamp only exposes the timestamp of the last successful backup per schedule. It is difficult to figure out how to write the alert using this metric when we only need whether the last backup for each schedule was successful or not.

Describe the solution you'd like
We would like to have a metric like velero_backup_last_status which would be of metric type = gauge.
The metric would return 1 or 0 depending on the success or failure of the last backup. The metric would also expose the schedule as a label. As of now, we only have the metric exposing successes via the velero_backup_last_successful_timestamp. The missing detail here is whether the last backup attempt succeeded or failed which the new metric can expose.

Anything else you would like to add:
We have a submitted a PR to upstream #5397 which can be extended further once this metric is exposed by Velero. It would be able to fire an alert if a backup failed for a specific schedule.

The alert should stop firing as soon as the backup is created for that schedule. Any suggestions are welcome on how to correctly approach this as we have multiple schedules creating backups twice a day, daily, and weekly.

Environment:

  • Velero version (use velero version): v1.8.1
  • Kubernetes version (use kubectl version): 1.20.10
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.2 LTS (kernel version : 5.8.0-1041-aws)

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@allenxu404
Copy link
Contributor

allenxu404 commented Jan 10, 2023

I think the idea of adding the metrics velero_backup_last_status with the label of schedule is helpful especially for schedule backup. Currently Velero lacks a metric which shows the status of the specific backup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/requirement Metrics Related to prometheus metrics
Projects
None yet
Development

No branches or pull requests

5 participants