You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a VRO engineer, I would like the ability to view rabbitmq metrics in datadog at the pod level (per environment), so that I can better troubleshoot issues and refine failover policies. At this time, we are only able to monitor Rabbitmq connection on a global scale per VRO applications but we can't pinpoint which environment(s) is specifically affected when there is a drop in Rabbitmq connections.
Acceptance Criteria
1. RabbitMQ metrics are viewable in Datadog at the pod level, with specific breakdowns by environment (e.g. RabbitMQ in dev, RabbitMQ in prod, etc) to allow for more granular monitoring of RabbitMQ connections.
2. Datadog alerts are configured to notify VRO team on Slack when RabbitMQ connection is lost for the specified environment, with details specifying which environment(s) is affected to aid in troubleshooting.
3. Make a separate dashboard
Not included in this work
Here are tickets that handle monitoring of apps (BIP, BGS) individually and they also monitor their connectivity to RabbitMQ: #3017 #3018
The text was updated successfully, but these errors were encountered:
meganhicks
changed the title
track rabbitmq metrics in datadog - update needed by 9/24
Alerting rabbitmq metrics in datadog - update needed by 9/24
Sep 30, 2024
Ponnia-M
changed the title
Alerting rabbitmq metrics in datadog - update needed by 9/24
Alerting rabbitmq metrics in datadog
Oct 1, 2024
Important
respond by 9/24 to https://github.com/department-of-veterans-affairs/lighthouse-di-tenant-support/issues/39 (add a comment to the ticket) on whether we will pursue next steps; and if so, which engineer will work with Tyler
User Story
As a VRO engineer, I would like the ability to view rabbitmq metrics in datadog at the pod level (per environment), so that I can better troubleshoot issues and refine failover policies. At this time, we are only able to monitor Rabbitmq connection on a global scale per VRO applications but we can't pinpoint which environment(s) is specifically affected when there is a drop in Rabbitmq connections.
Acceptance Criteria
Not included in this work
Here are tickets that handle monitoring of apps (BIP, BGS) individually and they also monitor their connectivity to RabbitMQ:
#3017
#3018
The text was updated successfully, but these errors were encountered: