Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting rabbitmq metrics in datadog #3482

Open
2 of 3 tasks
lisac opened this issue Sep 19, 2024 · 2 comments
Open
2 of 3 tasks

Alerting rabbitmq metrics in datadog #3482

lisac opened this issue Sep 19, 2024 · 2 comments
Assignees
Labels
backlog To be addressed in a future sprint VRO-team

Comments

@lisac
Copy link
Contributor

lisac commented Sep 19, 2024

Important

respond by 9/24 to https://github.com/department-of-veterans-affairs/lighthouse-di-tenant-support/issues/39 (add a comment to the ticket) on whether we will pursue next steps; and if so, which engineer will work with Tyler

User Story

As a VRO engineer, I would like the ability to view rabbitmq metrics in datadog at the pod level (per environment), so that I can better troubleshoot issues and refine failover policies. At this time, we are only able to monitor Rabbitmq connection on a global scale per VRO applications but we can't pinpoint which environment(s) is specifically affected when there is a drop in Rabbitmq connections.

Acceptance Criteria

  • 1. RabbitMQ metrics are viewable in Datadog at the pod level, with specific breakdowns by environment (e.g. RabbitMQ in dev, RabbitMQ in prod, etc) to allow for more granular monitoring of RabbitMQ connections.
  • 2. Datadog alerts are configured to notify VRO team on Slack when RabbitMQ connection is lost for the specified environment, with details specifying which environment(s) is affected to aid in troubleshooting.
  • 3. Make a separate dashboard

Not included in this work
Here are tickets that handle monitoring of apps (BIP, BGS) individually and they also monitor their connectivity to RabbitMQ:
#3017
#3018

@lisac lisac added backlog To be addressed in a future sprint needs-refinement needs refinement before it's ready to work VRO-team labels Sep 19, 2024
@lisac lisac changed the title track rabbitmq metrics in datadog track rabbitmq metrics in datadog - update needed by 9/24 Sep 19, 2024
@Ponnia-M Ponnia-M self-assigned this Sep 22, 2024
@lisac
Copy link
Contributor Author

lisac commented Sep 27, 2024

hi @Ponnia-M , i noticed Tyler asked for confirmation/feedback on what appears to be a metrics dashboard - his comment in https://github.com/department-of-veterans-affairs/lighthouse-di-tenant-support/issues/39#issuecomment-2375302908 - is that something you can respond to?

@meganhicks meganhicks changed the title track rabbitmq metrics in datadog - update needed by 9/24 Alerting rabbitmq metrics in datadog - update needed by 9/24 Sep 30, 2024
@Ponnia-M Ponnia-M changed the title Alerting rabbitmq metrics in datadog - update needed by 9/24 Alerting rabbitmq metrics in datadog Oct 1, 2024
@meganhicks meganhicks removed the needs-refinement needs refinement before it's ready to work label Oct 1, 2024
@Ponnia-M
Copy link
Contributor

I created a new LHDI issue in regards to this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog To be addressed in a future sprint VRO-team
Projects
None yet
Development

No branches or pull requests

3 participants