Alert behavior
Notify-only vs auto-analyze, cooldown windows, and daily alert caps
When a rule fires, Watchdog has to decide what to do. Two settings control that: the action and the rate-limiting windows.
Action: notify-only vs auto-analyze
| Action | What happens when the rule fires |
|---|---|
notify_only | A short alert is sent to the project's notification channels (or rule-specific channels if overridden). No analysis is run. |
auto_analyze | A full analysis is run automatically using the rule's lookback window as the time range. The diagnosis is then posted to channels. |
Notify-only is the safer default — no quota consumption, low noise. Use it for new rules until you've seen them fire a few times and are confident they're actually pointing at real issues.
Auto-analyze is the "page me with the answer, not the question" mode. It consumes from your monthly analysis quota every time the rule fires.
Cooldown
The cooldown is the minimum gap between consecutive alerts from the same rule.
| Field | Notes |
|---|---|
| Cooldown minutes | Default 30 |
If a rule fires and then re-fires 5 minutes later, the second alert is suppressed. The rule will continue to evaluate in the background and fire again once the cooldown expires.
Cooldown protects you from alert storms when an issue persists across many evaluation windows. Set it equal to or longer than your lookback — otherwise the same incident can produce multiple alerts as it slides through evaluation windows.
Daily alert cap
A hard ceiling on alerts per rule per day.
| Field | Notes |
|---|---|
| Daily alert cap | Default 10 |
Once the cap is hit, the rule stops firing for the rest of the day (midnight UTC reset). The activity feed shows that the cap was reached so you know to revisit the rule.
This is the last line of defense against runaway noise — usually you'd tune the rule's thresholds long before hitting the cap.
Channel routing
By default, alerts go to the project's notification channels (those that meet the project's severity threshold for the relevant severity).
A rule can override that with rule-specific channels — pick a
subset of the project's channels. Useful for sending Watchdog alerts
to a dedicated #watchdog-alerts channel instead of the general
project channel.
| Setting | Behavior |
|---|---|
| No channel override | Use the project's default channels |
| Channel override list | Use only the channels listed (must be channels already attached to the project) |
Auto-disable
If a rule fails to evaluate repeatedly (e.g., the log source is unreachable, credentials are invalid), Watchdog auto-disables the rule and notifies you. The rule stays disabled until you fix the upstream issue and re-enable it manually — a safety mechanism so a broken rule doesn't silently miss real incidents.
Recommended starting config
For a new rule on a production service:
| Setting | Value |
|---|---|
| Action | notify_only for the first week, then upgrade to auto_analyze |
| Interval | 15 min |
| Lookback | 30 min |
| Cooldown | 30 min |
| Daily cap | 10 |
Tune from there based on how often it actually fires.
