SigSentrySigSentry

Rule types

Error count, error rate, pattern match, and spike detection — when to use each

Every Watchdog rule has one of four types. Pick based on what "something's wrong" looks like in your logs.

Error count threshold

Fires when the number of error-level logs in the lookback window exceeds a fixed count.

FieldNotes
Count thresholdThe number that triggers the alert
Lookback minutesHow far back to count (default 30)

Best for: services with steady, low-error baselines where any spike is interesting. Good first rule for new projects.

Example: "Notify me when checkout-api logs more than 50 errors in 30 minutes."

Count thresholds don't account for traffic — at peak load, even a healthy service might exceed the threshold. Use error rate if your traffic varies a lot.

Error rate threshold

Fires when the ratio of error logs to total logs exceeds a percentage.

FieldNotes
Rate thresholdDecimal between 0 and 1 — e.g., 0.05 for 5%
Min error countFloor below which the rule won't fire even if the rate is exceeded (default 5) — prevents false positives at low traffic
Lookback minutesHow far back to evaluate (default 30)

Best for: services with variable traffic where the proportion of errors matters more than the absolute count.

Example: "Notify me when checkout-api error rate exceeds 5% over 30 minutes — but only if there are at least 20 errors total."

The min-error-count guard is important. Without it, a service that saw 1 request and 1 error would have a 100% error rate and trigger the alert.

Pattern match (regex)

Fires when any log line in the lookback window matches a regex pattern.

FieldNotes
PatternsOne or more { label, regex } entries — first match fires the rule
Lookback minutesHow far back to scan (default 30)

Best for: known-bad strings you've seen before — circuit-breaker trips, OOM, specific error codes, deprecated API hits.

Example patterns:

{ label: "OOM", regex: "OutOfMemoryError|java.lang.OutOfMemoryError" }
{ label: "Stripe failure", regex: "stripe.*Connection refused" }
{ label: "Deprecated endpoint", regex: "DEPRECATED_API_HIT" }

The label is what gets shown in the alert. Regex is evaluated against the log message field.

Test your patterns with dry-run before enabling — a slightly-too-broad regex can match thousands of lines and fire constantly.

Spike detection

Fires when error volume in the lookback window exceeds a multiple of the baseline from the previous comparable window.

FieldNotes
Spike multiplierThe factor by which current window must exceed the prior baseline (default 3.0)
Min error countFloor for the current window (default 5)
Lookback minutesThe window length (default 30)

Best for: "I don't know what normal looks like, but I want to know when it changes". Good for services where steady-state is variable and absolute thresholds don't work.

Example: "Notify me when error volume in the last 30 minutes is 3× higher than the 30 minutes before that — and at least 5 errors."

Choosing between them

If you...Use
Have a quiet service that should never errorCount threshold
Have a busy service where ratios matter moreRate threshold
Know the specific string you're huntingPattern match
Just want to be told when things changeSpike detection

You can have multiple rules per project — count + pattern is a common combination ("page on volume + page on specific bad strings").