Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent Failures #3471

Open
dfitchett opened this issue Sep 17, 2024 · 0 comments
Open

Silent Failures #3471

dfitchett opened this issue Sep 17, 2024 · 0 comments
Labels
epic A collection of user stories spanning multiple repositories VRO-team zero-silent-failures

Comments

@dfitchett
Copy link
Contributor

dfitchett commented Sep 17, 2024

Description

As part of our commitment to ensuring a seamless experience for Veterans and claimants, this Epic focuses on detecting, mitigating, and preventing silent failures within the VRO services that impact claim processing. Silent failures occur when a critical error happens without notifying the user, leaving Veterans and other stakeholders unaware that their action (e.g., submitting a claim, uploading a document) has failed. The goal of this initiative is to systematically identify where silent failures may occur, implement monitoring and alerting solutions, and ensure all errors are appropriately surfaced to the end-user or internally, so action can be taken.

Why

Silent failures across various VRO services, including APIs and background job processing, can lead to significant delays or failures in claim processing without notifying Veterans or claimants. This not only impacts user experience but also compromises the integrity of the VRO platform. It is crucial to detect, prevent, and address these silent errors to maintain trust and efficiency in handling Veterans' claims.

Goals

--

  1. Detect any existing silent failures within VRO services impacting claims.
  2. Ensure that failures in asynchronous and synchronous processes are properly surfaced.
  3. Establish robust monitoring and alerting mechanisms to capture failures in real-time.
  4. Improve overall system reliability, reducing the chance of silent errors going unnoticed.
  5. Align the team with the goal of zero silent errors and promote transparency through consistent monitoring and reporting.

Targeted Timelines

  1. Discovery - investigate and determine if VRO services have any silent failures - by October 1
  2. Remediation - address potential points of failure and have notification systems in place - by November 12
@dfitchett dfitchett added VRO-team epic A collection of user stories spanning multiple repositories labels Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic A collection of user stories spanning multiple repositories VRO-team zero-silent-failures
Projects
None yet
Development

No branches or pull requests

2 participants