You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Epic Description:
As members of the VRO Team, our overarching objective is to boost the observability of our production systems, aligning with the strategic priority set by the Benefits Portfolio. To accomplish this, we plan to conduct a thorough assessment of the current state of our platform's observability. This initiative is fueled by the proactive identification and resolution of potential issues, ultimately minimizing the risk of critical incidents going unnoticed. By obtaining a deeper understanding of our system's observability, we can make informed improvements that significantly contribute to the overall stability and performance of our platform.
In Scope:
Assess the current state of observability on the VRO platform, covering all services and applications, Identify gaps in observability and develop an MVP.
MVP
Develop a single health dashboard tailored for the VRO Team encompassing all services
Incorporate essential metrics such as CPU usage, memory utilization, network traffic, pod availability, and recent deployments.
Establish benchmarks for CPU (X), memory (X), and network traffic (X).
Incorporate custom metrics for our platform applications so we can be aware when there are service outages.
Establish benchmarks for monitoring partner applications
Implement monitoring and proactive alerting mechanisms
Configure alerts to notify the VRO Team promptly in case of suboptimal application performance.
Not In Scope:
Incident response plan with defined SLAs. This will be addressed through a separate initiative.
Hypothesis:
By implementing a proactive monitoring system capable of detecting potential stability issues preemptively, we aim to prevent issues before the arise. We hope to lesson any adverse impact to our partners applications and provide them insight into what we are monitoring and why it matters.
The text was updated successfully, but these errors were encountered:
Epic Description:
As members of the VRO Team, our overarching objective is to boost the observability of our production systems, aligning with the strategic priority set by the Benefits Portfolio. To accomplish this, we plan to conduct a thorough assessment of the current state of our platform's observability. This initiative is fueled by the proactive identification and resolution of potential issues, ultimately minimizing the risk of critical incidents going unnoticed. By obtaining a deeper understanding of our system's observability, we can make informed improvements that significantly contribute to the overall stability and performance of our platform.
In Scope:
Assess the current state of observability on the VRO platform, covering all services and applications, Identify gaps in observability and develop an MVP.
MVP
Not In Scope:
Incident response plan with defined SLAs. This will be addressed through a separate initiative.
Hypothesis:
By implementing a proactive monitoring system capable of detecting potential stability issues preemptively, we aim to prevent issues before the arise. We hope to lesson any adverse impact to our partners applications and provide them insight into what we are monitoring and why it matters.
The text was updated successfully, but these errors were encountered: