Scheduling
Fixed-interval checks vs randomized averages — pick what fits your traffic shape
Watchdog evaluates each rule on a schedule. There are two modes.
Fixed interval
The rule runs at a regular cadence — every N minutes, on the dot.
| Field | Notes |
|---|---|
| Interval minutes | The cadence (default 15) |
Best for: most rules. Predictable, easy to reason about, and the default for new rules.
Randomized average
The rule runs at randomized times that average out to N checks per day.
| Field | Notes |
|---|---|
| Avg checks per day | The target average (e.g., 96 = ~every 15 minutes on average) |
Best for: services where you want monitoring coverage but don't want all rules firing exactly on the same minute. Useful if you're running 5+ rules and want to spread the evaluation load on your log source.
Picking an interval
The interval determines how quickly Watchdog can detect an issue. A 15-minute interval means a worst-case 15-minute delay between when an issue starts and when the rule notices.
| Interval | When to use |
|---|---|
| 5 min | Critical paths, tight SLOs, high-traffic services |
| 15 min | Default — good for most production services |
| 30 min | Background services, batch jobs, low-traffic admin |
| 60 min | Cost-conscious monitoring on quiet services |
Shorter intervals catch issues faster but consume more scheduled checks. See the cost calculator to estimate.
Lookback vs interval
These are independent settings:
- Interval — how often the rule runs (every X minutes)
- Lookback — how much history each run examines (last Y minutes)
A common pattern is interval = 15min, lookback = 30min. Each run
looks back further than the gap between runs, so transient issues are
seen by at least two consecutive runs — useful for spike detection.
A rule with interval = 60min, lookback = 30min would have gaps
where issues could occur without being seen at all. Avoid that
configuration.
