Rule types
Error count, error rate, pattern match, and spike detection — when to use each
Every Watchdog rule has one of four types. Pick based on what "something's wrong" looks like in your logs.
Error count threshold
Fires when the number of error-level logs in the lookback window exceeds a fixed count.
| Field | Notes |
|---|---|
| Count threshold | The number that triggers the alert |
| Lookback minutes | How far back to count (default 30) |
Best for: services with steady, low-error baselines where any spike is interesting. Good first rule for new projects.
Example: "Notify me when checkout-api logs more than 50 errors in 30 minutes."
Count thresholds don't account for traffic — at peak load, even a healthy service might exceed the threshold. Use error rate if your traffic varies a lot.
Error rate threshold
Fires when the ratio of error logs to total logs exceeds a percentage.
| Field | Notes |
|---|---|
| Rate threshold | Decimal between 0 and 1 — e.g., 0.05 for 5% |
| Min error count | Floor below which the rule won't fire even if the rate is exceeded (default 5) — prevents false positives at low traffic |
| Lookback minutes | How far back to evaluate (default 30) |
Best for: services with variable traffic where the proportion of errors matters more than the absolute count.
Example: "Notify me when checkout-api error rate exceeds 5% over 30 minutes — but only if there are at least 20 errors total."
The min-error-count guard is important. Without it, a service that saw 1 request and 1 error would have a 100% error rate and trigger the alert.
Pattern match (regex)
Fires when any log line in the lookback window matches a regex pattern.
| Field | Notes |
|---|---|
| Patterns | One or more { label, regex } entries — first match fires the rule |
| Lookback minutes | How far back to scan (default 30) |
Best for: known-bad strings you've seen before — circuit-breaker trips, OOM, specific error codes, deprecated API hits.
Example patterns:
{ label: "OOM", regex: "OutOfMemoryError|java.lang.OutOfMemoryError" }
{ label: "Stripe failure", regex: "stripe.*Connection refused" }
{ label: "Deprecated endpoint", regex: "DEPRECATED_API_HIT" }The label is what gets shown in the alert. Regex is evaluated against the log message field.
Test your patterns with dry-run before enabling — a slightly-too-broad regex can match thousands of lines and fire constantly.
Spike detection
Fires when error volume in the lookback window exceeds a multiple of the baseline from the previous comparable window.
| Field | Notes |
|---|---|
| Spike multiplier | The factor by which current window must exceed the prior baseline (default 3.0) |
| Min error count | Floor for the current window (default 5) |
| Lookback minutes | The window length (default 30) |
Best for: "I don't know what normal looks like, but I want to know when it changes". Good for services where steady-state is variable and absolute thresholds don't work.
Example: "Notify me when error volume in the last 30 minutes is 3× higher than the 30 minutes before that — and at least 5 errors."
Choosing between them
| If you... | Use |
|---|---|
| Have a quiet service that should never error | Count threshold |
| Have a busy service where ratios matter more | Rate threshold |
| Know the specific string you're hunting | Pattern match |
| Just want to be told when things change | Spike detection |
You can have multiple rules per project — count + pattern is a common combination ("page on volume + page on specific bad strings").
