There has been a recent increase in customer issues. The issues have increased to more than 300%. indicating a need to change the escalation and support processes to resolve issues at a faster rate.
Currently, there is no clear bug prioritization and escalation process, which creates confusion among teams, resulting in issues not being addressed within SLAs. Also, there is a need to revise the test strategy, improve monitoring and observability, and optimize our cycle time through CI/CD pipelines.
We need a well-defined bug ticketing process. which involves appropriately categorizing issues in JIRA, defining bug priority and severity criteria, and prioritizing issues based on their impact on customer experience, business impact, and urgency. This prioritization will allow the team to focus on the most critical issues first.
Advancement into various levels of support needs to be improved upon. A clearly defined SLA needs to be set up. In cases where the issue cannot be resolved at the current support level, a clear escalation path should be in place. This could include escalation to a higher support level, involving engineering resources
Our test strategy needs to be revised to include performance and load testing. Our engineering infrastructure needs to accommodate these platform-focused tests. Additionally, customer issues have recently increased due to new features that were recently launched. A thorough regression test suite with agreed-upon user flows needs to be automated. A combination of web and API tests will be required to address testing gaps.
We will be defining clear and measurable KPIs to make sure our framework is successful. Some KPIs but not limited will include:
- “the number of all issues addressed within SLA”,
- “Number of P0/P1 issues resolved within SLA”
- “Defect resolution time”
- “Issue backlog rate” Further, we will also implement a JIRA dashboard to provide real-time visibility and transparency. This will help our team to address the root cause rather than just treat symptoms, leading to long-term improvements in support quality.
While we are addressing our current needs it is also essential to keep our long-term goals in mind. We want to instill “SHIFT LEFT” and bring in a DevOps culture where Quality is built within our engineering process. Our long-term focus needs to make improvements in observability, expanding automation test coverage using various tools.
Monitoring and Observability is a critical part of our system. Providing teams with improved logging and monitoring abilities into our complex systems and troubleshooting issues in real time. By using tools like Grafana and Prometheus we can improve our logging, visualization and alerting capabilities. This will have a multiplier effect and will boost our SLA times by over 40%.
We plan to incorporate DevOps practices into our development process. We plan to implement continuous integration and delivery (CI/CD)pipelines to automate the build, test, and deploy process, reducing the feedback cycle even further.
While we are improving and increasing our test coverage, it is also important to focus on performance and load testing our APIs. This will require planning and infrastructure changes as we will need to simulate real-time traffic. By expanding our test coverage to include performance testing we will provide a positive experience for our users.
By implementing these proposed changes, we can effectively scale to meet the growing number of customer-reported issues. We will continuously monitor KPIs to ensure that the implemented processes effectively address the underlying causes of these issues. As a result, we aim to enhance our quality program and improve customer satisfaction.