Is there a way to have the stats display only if the outage lasts longer than 5 minutes? Much of what’s currently shown is just from server restarts.
What we’re looking for is a tiered alerting model. For example, the MTB sites previously used this setup:
5 minutes: site considered down and initial alert sent (Tier 1).
10 minutes: repeat alert if still down.
20 minutes: escalation alert sent to additional recipients (Tier 2).
Recovery: email sent once the site was back up.
We’d like to balance where people aren’t getting annoyed by multiple alerts.