Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RCA methods #132

Merged
merged 1 commit into from
Jan 29, 2024
Merged

Add RCA methods #132

merged 1 commit into from
Jan 29, 2024

Conversation

abtris
Copy link
Owner

@abtris abtris commented Jan 29, 2024

No description provided.

* https://cloudpundit.com/2021/10/28/five-p-factors-for-root-cause-analysis/

The Five Ps (described in IT terms) — well, really six Ps, a problem and five P factors — are as follows:
* The **presenting problem**is not only the core impact, but also its broader consequences, which all should be examined and addressed. For instance, “The FizzBots service was down” becomes “Our network was unstable, resulting in  FizzBots service failure. Our call center was overwhelmed, our customers are mad at us, and we need to pay out on our SLAs.”

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [alex] reported by reviewdog 🐶
Be careful with failure, it’s profane in some cases failure retext-profanities

* https://cloudpundit.com/2021/10/28/five-p-factors-for-root-cause-analysis/

The Five Ps (described in IT terms) — well, really six Ps, a problem and five P factors — are as follows:
* The **presenting problem**is not only the core impact, but also its broader consequences, which all should be examined and addressed. For instance, “The FizzBots service was down” becomes “Our network was unstable, resulting in  FizzBots service failure. Our call center was overwhelmed, our customers are mad at us, and we need to pay out on our SLAs.”

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [alex] reported by reviewdog 🐶
Be careful with mad, it’s profane in some cases mad retext-profanities

* The **perpetuating factors**are the things that resulted in the incident continuing (or becoming worse), once triggered. For instance, “When the network was down, application components queued requests, ran out of memory, crashed, and had to be manually recovered.”
* The **predisposing factors**are the long-standing things that made it more likely that a bad situation would result. For instance, “We do not have automation that checks for bad configurations and prevents their propagation.” or “We are running outdated software on our load-balancers that contains a known bug that results in sometimes sending requests to unresponsive backends.”
* The **protective factors**are things that helped to limit the impact and scope (essentially, your resilience mechanisms). For instance, “We have automation that detected the problem and reverted the configuration change, so the network outage duration was brief.”
* The **present factors**are other factors that were relevant to the outcome (including “where we got lucky”). For instance, “A new version of an application component had just been pushed shortly before the network outage, complicating problem diagnosis,” or “The incident began at noon, when much of the ops team was out having lunch, delaying response.”

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [alex] reported by reviewdog 🐶
just may be insensitive, try not to use it just retext-equality

* [Using a Fishbone (or Ishikawa) Diagram to Perform 5-why Analysis | K Bulsuk: Full Speed Ahead](https://www.bulsuk.com/2009/08/using-fishbone-diagram-to-perform-5-why.html)

* 5 Why - The Five Why’s is still considered as a best practice by many teams and is a common way to run the root-cause analysis process.The idea here is to ask “why” in succession, going deeper and uncovering more information each time. - https://newsletter.pragmaticengineer.com/p/incident-review-best-practices
* The framework is very easy to get started, when teams don’t do much digging into incidents. However, as Andrew Hatch at LinkedIn shares in the talk [Learning More from Complex systems](https://www.usenix.org/conference/srecon21/presentation/hatch) , there are risks to relying on the Five Whys:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [alex] reported by reviewdog 🐶
easy may be insensitive, try not to use it easy retext-equality

* 5 Why - The Five Why’s is still considered as a best practice by many teams and is a common way to run the root-cause analysis process.The idea here is to ask “why” in succession, going deeper and uncovering more information each time. - https://newsletter.pragmaticengineer.com/p/incident-review-best-practices
* The framework is very easy to get started, when teams don’t do much digging into incidents. However, as Andrew Hatch at LinkedIn shares in the talk [Learning More from Complex systems](https://www.usenix.org/conference/srecon21/presentation/hatch) , there are risks to relying on the Five Whys:

> *The danger of the Five Whys is how, by following it, we might miss out on other root causes of the incident. (...)***We’re not broadening our understanding.***We’re just trying to narrow down on one thing, fix it, and hope that this will make the incident not happen again.*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [alex] reported by reviewdog 🐶
just may be insensitive, try not to use it just retext-equality

*
* The scientific method can be integrated into RCA by using cycles of [PDCA](https://www.isixsigma.com/methodology/plan-do-check-act/six-sigma-pdca-steroids/) . The planning phases consist of describing the problem, collecting data and forming a hypothesis.
* **P**: Whether freshly formed or taken from an Ishikawa diagram, the hypothesis should make some form of prediction (or *plan*), such as “measurement deviation” predicting “parts will be measured out of specification.”
* **D**: The next step is *do* – where the hypothesis is evaluated. This could be as simple as measuring a part or as elaborate as designing a new type of test method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [alex] reported by reviewdog 🐶
simple may be insensitive, try not to use it simple retext-equality

* The **perpetuating factors**are the things that resulted in the incident continuing (or becoming worse), once triggered. For instance, “When the network was down, application components queued requests, ran out of memory, crashed, and had to be manually recovered.”
* The **predisposing factors**are the long-standing things that made it more likely that a bad situation would result. For instance, “We do not have automation that checks for bad configurations and prevents their propagation.” or “We are running outdated software on our load-balancers that contains a known bug that results in sometimes sending requests to unresponsive backends.”
* The **protective factors**are things that helped to limit the impact and scope (essentially, your resilience mechanisms). For instance, “We have automation that detected the problem and reverted the configuration change, so the network outage duration was brief.”
* The **present factors**are other factors that were relevant to the outcome (including “where we got lucky”). For instance, “A new version of an application component had just been pushed shortly before the network outage, complicating problem diagnosis,” or “The incident began at noon, when much of the ops team was out having lunch, delaying response.”

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LanguageTool] reported by reviewdog 🐶
Possible agreement error. The noun ‘why’ seems to be countable. (CD_NN[1])
Suggestions: whys
Rule: https://community.languagetool.org/rule/show/CD_NN?lang=en-US&subId=1
Category: GRAMMAR

Copy link

netlify bot commented Jan 29, 2024

Deploy Preview for house-keeper-proficiencies-68654 ready!

Name Link
🔨 Latest commit c79ec9f
🔍 Latest deploy log https://app.netlify.com/sites/house-keeper-proficiencies-68654/deploys/65b7966f5cc3250008b6c083
😎 Deploy Preview https://deploy-preview-132--house-keeper-proficiencies-68654.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@abtris abtris merged commit dfc284d into master Jan 29, 2024
9 checks passed
@abtris abtris deleted the abtris/update-postmortem-methods branch January 29, 2024 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant