Skip to main content

Anomaly Correlation

The Anomaly Correlation page at /data-quality/correlations finds patterns across multiple anomalies. When freshness drops on customers at 02:14 and volume drops on orders and transactions within 30 minutes, that's not three separate problems — it's one upstream pipeline failure.

Why correlate

Symptom-level alerting (one alert per anomaly) drowns on-call in noise. Root-cause-level alerting (one incident with N correlated anomalies) makes triage fast and identifies the real culprit.

What gets correlated

The correlator looks for clusters by:

  • Time — anomalies within the same window (default 30 min)
  • Lineage — anomalies on tables connected in the Lineage graph
  • Upstream system — anomalies on tables sourced from the same federation connection or cluster
  • Owner — anomalies on tables owned by the same team
  • Schedule — anomalies on tables refreshed by the same job

Result

Each correlation cluster shows:

  • Cluster ID + start / end timestamps
  • Anomaly count
  • Tables affected
  • Suspected root — the upstream-most table in the cluster's lineage
  • Suggested action — links to the suspected job, schedule, or pipeline

Click into a cluster to see the full anomaly list and a mini-lineage diagram highlighting the suspected root.

Promote to incident

The Create Incident button wraps the entire cluster as a single Incident with all anomalies linked.

Tuning

clxs.yaml controls:

data_quality:
correlation:
time_window_minutes: 30
lineage_depth: 3
min_cluster_size: 2

Smaller windows = tighter clusters but more singletons. Larger lineage depth = better at catching root causes but slower correlation runs.

API

GET /data-quality/correlations?status=open
GET /data-quality/correlations/{cluster_id}
POST /data-quality/correlations/{cluster_id}/incident