MDM Profiling
The MDM Profiling page at /mdm/profiling runs statistical profiles over master-entity attributes — distributions, null rates, distinct counts, pattern detection. Same shape as Column Profiling but operates on MDM golden records and scoped per entity type.
Why profile master data
- Spot quality issues — high null rate on
emailmeans the source isn't filling it; low distinct onindustry_codemay mean stale ref data - Tune match rules — uneven distributions in matching attributes indicate weighting needs revisit
- Validate normalization — see whether the reference-data normalisation is collapsing values as expected
- Baseline expectations — capture a profile snapshot to detect drift later
Run a profile
Pick an entity type and click Profile:
POST /mdm/profiling
{ "entity_type": "customer", "sample": 100000 }
For < 1M entities the profile runs against the full set; above that it samples.
Per-attribute card
Each attribute gets a card with:
- Type + format
- Null count / rate
- Distinct count
- Top values with frequency
- Pattern detection — email-like, UUID-like, phone-like
- Conformity rate — % matching the reference-data or regex constraint
- Distribution chart — histogram (numeric) or top-values bar (categorical)
Source split
Profiles can be split by source system to compare quality:
- "
emailis null in 45% of Salesforce records but 2% of Oracle records — the Salesforce ingest is broken"
The split view shows side-by-side cards per source.
Baselines
Save a profile as a baseline for drift comparison:
POST /mdm/profiling/baseline
{ "entity_type": "customer", "name": "2026-04-30_baseline" }
MDM Scorecards compares current profiles against the latest baseline to compute the conformity score.
Generating rules
The Generate Rules button on each attribute scaffolds DQ rules from the profile:
- Null-rate thresholds from observed null %
- Distinct-count expectations
- Conformity rules from inferred patterns
You review and edit before saving — rules land in the DQ Rules Engine.
API
POST /mdm/profiling
GET /mdm/profiling/{run_id}
POST /mdm/profiling/baseline
GET /mdm/profiling/baselines
Related
- Column Profiling — non-MDM analogue
- MDM Scorecards — uses these profiles
- Schema Drift — schema-level companion