Skip to content

Latest commit

 

History

History

incidents

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

NOTE: Outdated! Please see the Auctionet incident response plan.

TODO: Copy over any useful info from here to there, and then remove this file and any links to it within this repo.

Incident response

A checklist for what to do for incidents such as site downtime.

Checklist

Until the problem is resolved

  • Assign an incident lead – a single person that is responsible for this checklist. They should delegate tasks explicitly.
  • If there are remote workers, "get everone in the same room" by setting up a video and audio link, e.g. Zoom.
  • The incident lead should assign a communicator. The communicator ensures that we inform every affected party. May be in person, by chat, by phone, Auctionet system messages etc.
    • The support team.
    • Auction houses (see example bulletin).
    • Buyers, sellers, or other affected parties (see example campaign – remember to disable all other campaigns).
  • Consider communicating:
    • When we first notice the problems.
    • When there is some workaround.
    • When the problems are resolved (from the affected party's standpoint).
  • The incident lead should assign a team of deep delvers to dig into the underlying issue.
  • The incident lead should assign a team of quickfixers to see what we can do right now to minimise the impact and unblock affected parties.
  • The incident lead may want to create a Trello card to keep track of things for this incident.

Anyone not tapped by the incident lead is free to keep working on other things. It is the lead's responsibility to call for all hands if necessary.

Post mortem

Not too long after the problem is resolved, we want a "post mortem" meeting.

The goal of the meeting is to come up with any learnings and actions that let us do better work in future.

  • CTO and product owner should attend so we can decide what resources to allocate.
  • The discussion should be facilitated (have someone managing it) to keep us on track.

Meeting agenda

  • Timeline: What happened? What did we do? What happened then? Where did we leave things?
  • How did this affect end users? Auction houses, buyers, sellers, support, finance, …. What can we then improve?
  • Reflect on the post mortem. Can we do post mortems better? Update this document with any learnings.

Inspiration