Column Profiling

The Profiling page at /data-quality/profiling runs statistical profiles on every column of a target table — distributions, null rates, distinct counts, type inference, and pattern detection. Use it to understand a new dataset, baseline expectations, or spot quality issues without writing any rules.

What's profiled

Per column, Clone-Xs computes:

Stat	Notes
Type	Inferred + declared
Null count / rate	Across the full table
Distinct count	Approximate via HLL for big tables
Min / max	For numeric and date columns
Mean / stddev	Numeric only
Quantiles	P25, P50, P75, P95, P99
Top values	Top 20 by frequency
Patterns	Regex inference (e.g. emails, UUIDs, phone numbers)
Cardinality class	Categorical / continuous / unique

Running a profile

Pick a table and click Profile. The page calls:

POST /data-quality/profiling
{ "table_fqn": "prod_warehouse.sales.customers" }

For tables ≥ 1 GB, the profiler samples at min(1M rows, 5%) by default. Override:

POST /data-quality/profiling
{ "table_fqn": "...", "sample": "full" }   # or { "rows": 5000000 }

Result UI

Per-column cards show:

Histogram (numeric) or top-values bar chart (categorical)
Stat table (the metrics above)
Pattern badges (e.g. Email-like, UUID-like)
Expand-to-detail with sample values

Saving baselines

Profiles can be saved as a baseline for later comparison (Schema Drift uses the saved baseline to flag changes):

POST /data-quality/profiling/baseline
{ "table_fqn": "...", "name": "2026-04-30_baseline" }

Generating rules from profiles

The Generate Rules button on each column scaffolds DQX rules from the profile:

Null-rate thresholds from observed null %
Range checks from observed min/max
Pattern checks from inferred regex
Distinct-count expectations from cardinality

You review and edit before saving — nothing is auto-saved.

API

POST /data-quality/profiling
GET  /data-quality/profiling/baselines?table_fqn=...
POST /data-quality/profiling/baseline
GET  /data-quality/profiling/{run_id}

Schema Drift — track changes against a baseline
Rules Engine — saved auto-generated rules go here
DQ Suite Overview

What's profiled​

Running a profile​

Result UI​

Saving baselines​

Generating rules from profiles​

API​

Related​

What's profiled

Running a profile

Result UI

Saving baselines

Generating rules from profiles

API

Related