While on-call, not every high priority alert you get will be a real customer impacting incident. Some will be false alarms. A common anti-pattern is for teams to ignore false alarms and then going on-call results in alert fatigue. You frequently get false alarms, so you stop paying close attention to the alarms and the system. This is the path to burn out and un-sustainable on-call.
When you get a false alarm, you should examine the data, look at how the alarm is setup, and question if it is the best way to accomplish the goal of that alarm.
If your team has a lot of false alarms, this might feel like an overwhelming task. Unless you and your team start to improve the situation, you will never get out from under this constant source of stress. Try to make some improvements each time you are on-call. You don’t have to fix everything all at once.
You should propose a better way to setup the alarm through code review or 2PR and make that improvement.
Similarly, when you create new alarms, get a code review or config review from the team.
With on-call it is important to set good boundaries, capture re-occurring pain points and bring them up with the team.Back to Guide Topic List