Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(alerts) : Alert if there was no successful eth-watcher iterations for 5 minutes #3299

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RomanBrodetski
Copy link
Collaborator

We have this alert:

      no_eth_watch_iterations:
        promql: max(rate(server_eth_watch_eth_poll[30s]))
        compare: "=="
        thresholds:
          critical: 0
        duration: 5m
        annotations:
          description: "No ETH watch iterations for 5 minutes"
          summary: "No ETH watch iterations for 5 minutes"
          runbook_url: "https://www.notion.so/matterlabs/On-Call-1d597d29c9b64c08919bf53c952cd03e?pvs=4#78da3253ab3e41aa8175e72caf456c86"

But we reported this server_eth_watch_eth_poll metric on every iteration - even if it wasn't successful. This PR changes the logic so that it's only reported when successful - meaning that we will be alerted if there was no successful iterations for five minutes.

We may want to increase the threshold to something like 10 minutes to avoid being notified at transient L1 provider issues.

Copy link
Contributor

github-actions bot commented Nov 18, 2024

Hey there! 👋🏼

We require pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.
Examples of valid PR titles:

  • feat(eth_sender): Support new transaction type
  • fix(state_keeper): Correctly handle edge case
  • ci: Add new workflow for linting

Details:

No release type found in pull request title "fix(alerts) : Alert if there was no successful eth-watcher iterations for 5 minutes". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - feat: A new feature
 - fix: A bug fix
 - docs: Documentation only changes
 - style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
 - refactor: A code change that neither fixes a bug nor adds a feature
 - perf: A code change that improves performance
 - test: Adding missing tests or correcting existing tests
 - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
 - ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
 - chore: Other changes that don't modify src or test files
 - revert: Reverts a previous commit

@RomanBrodetski RomanBrodetski changed the title Alert if there was no successful eth-watcher iterations for 5 minutes fix(alerts) : Alert if there was no successful eth-watcher iterations for 5 minutes Nov 18, 2024
Copy link
Contributor

@Deniallugo Deniallugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be two independent metrics.
In this particular case, we actually polling ethereum, nothing has stuck and service is working.

But you want to check that we poll it successfully, and i'd add a new metric for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants