Skip to content

Dynamic baselines with time-of-day and day-of-week awareness #692

@erikdarlingdata

Description

@erikdarlingdata

Summary

Static thresholds ("alert when CPU > 80%") generate alert fatigue because they don't account for normal workload patterns. CPU at 85% might be expected at 2pm Tuesday during month-end processing but alarming at 3am Sunday. Dynamic baselines learn what "normal" looks like for each time window and flag deviations from that pattern.

PerformanceMonitor's `compare_analysis` already does a primitive version of this (compare current 4 hours vs same window 28 hours ago). This issue tracks making baselines time-of-day and day-of-week aware.

Core Concept

  • Collect 30+ days of historical metrics
  • Build per-metric baselines segmented by time-of-day and day-of-week (e.g., "Tuesday 2pm CPU is typically 70-85%")
  • Compute confidence bands (e.g., mean ± 2-3 standard deviations)
  • Flag current values that fall outside the expected band for this specific time window
  • Continuously update baselines as workloads evolve

Which Metrics to Baseline

High value (clear daily/weekly patterns)

  • CPU utilization
  • Batch Requests/sec
  • Wait stats (total wait time per type)
  • Session/connection counts
  • Query duration aggregates

Medium value

  • Memory utilization (tends to be more stable)
  • I/O latency
  • TempDB usage
  • Blocking event counts

Where This Applies

Analysis Engine (both Dashboard and Lite)

The inference engine's fact scoring could incorporate baseline deviation as an amplifier. A CPU reading of 85% with a baseline of 80±5% scores low (normal). The same 85% with a baseline of 40±10% scores high (anomalous). This makes the engine's findings context-aware without changing the rule structure.

Alert Thresholds

Instead of fixed thresholds, alerts could fire on "deviation from baseline exceeds N standard deviations." This directly addresses alert fatigue — the #1 cited barrier to faster incident response (per 2024 industry survey).

Trend Charts (both Dashboard and Lite)

Overlay a shaded "expected range" band on metric charts. Visually, the user sees the metric line and a band showing what's normal. When the line exits the band, something changed. This is the visual equivalent of the annotation markers from issue #688 but for statistical context rather than discrete events.

compare_analysis Enhancement

The existing `compare_analysis` MCP tool compares two time windows. With baselines, it could compare the current window against the expected baseline for this time of day/week rather than a fixed offset, making the comparison more meaningful.

Data Requirements

Dashboard

Historical data is already in the `PerformanceMonitor` SQL Server database. Baseline computation could be a scheduled calculation (SQL Agent job or application-level) that maintains a baseline table with per-metric, per-hour-of-day, per-day-of-week statistics.

Lite

Historical data is in DuckDB/Parquet. Baseline computation could run as part of the collector cycle or on-demand. DuckDB's analytical query capabilities make time-bucketed aggregation efficient.

Both apps need at least 2-4 weeks of data before baselines become meaningful. New installations should gracefully degrade to static thresholds until sufficient history exists.

Design Notes

  • Start simple: mean and standard deviation per metric per hour-of-day per day-of-week
  • More sophisticated approaches (seasonal decomposition, exponential smoothing) can come later
  • The baseline computation itself is not computationally expensive — it's aggregating data that's already stored
  • The UX challenge is communicating "this is unusual for this time" vs "this crossed a fixed threshold" — the shaded band on charts is the clearest way
  • Applies to both Dashboard and Lite, plus MCP analysis tools

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions