When you are on-call, it is important to be responsive. This post will explain why and give some tips on how to be more responsive.
For each service that your team owns, you should know the consequences of an outage and having downtime. For some services with "just internal customers", it might not seem like a big deal. In that case, I would argue that you think about turning that service off, but that is another story ;)
An outage means lost revenue, losing customer trust, or both. Ideally teams can quantify roughly how many dollars per 5 minutes or per hour that an outage is costing them.
It doesn't have to be perfectly accurate estimate, but how much revenue, opportunity cost, or lead generation $ on average is that service doing every 5 minutes?
Putting into dollars really helps you and your team develop a sense of urgency. If a service being down costs the company $1000 per 5 minutes, the difference between you responding to an issue in 10 minutes versus 25 minutes is $15k. And this isn't a one time situation, this is for every customer impacting incident over time. That can easily be 100k per year!
It can be helpful to identify Tier-1 services from Tier-2 and to establish some goals around how quickly an on-call will acknowledge an issue and start working towards resolving it. This is the amount of time between being paged and being fully on-line and checking into your graphs, Slack, etc.
Here are some tips for improving your responsiveness:
By preparing for the worst case scenario, you will be ahead of the curve when you are woken up at 3am and can't think straight for 5 - 10 minutes.
If you are engaged in something and get paged, think about that $1k per 5 minutes. Visualize a dump truck filled with cash backing up to a dumpster fire and pouring it in. Another truck is coming in 5 minutes.
This will help you set aside whatever you are engaged with to quickly check-in and assess the severity and customer impact for whatever you were just paged about.
Next Post: Are we asking too much from Developers?