Reconciliation

Reconciliation answers: "is the cloned table the same as the source?". The Reconciliation page at /data-quality/reconciliation runs row-level, column-level, and deep-diff comparisons between two tables and produces a precise report of what differs.

Four modes

The page has four sub-tabs (each lives at its own URL):

Mode	URL	What it does
Row-level	`/data-quality/reconciliation/row-level`	Compare row counts and primary-key set membership
Column-level	`/data-quality/reconciliation/column-level`	Per-column hash compare (detects value drift)
Deep diff	`/data-quality/reconciliation/deep`	Full row-by-row + per-cell diff (slowest, most precise)
Run history	`/data-quality/reconciliation/history`	Past reconciliations with results

Row-level

Cheap. Only counts and PK set diff:

Rows in source not in target
Rows in target not in source
Total counts on both sides

POST /reconciliation/row-level
{ "source_fqn": "...", "target_fqn": "...", "primary_keys": ["id"] }

Useful as a quick "did the clone copy everything?" check.

Column-level

Per-column hash. For each column, computes a hash of all values (sorted by PK) and compares. If hashes match, the column is identical without scanning every row.

POST /reconciliation/column-level
{ "source_fqn": "...", "target_fqn": "...", "columns": ["email", "amount"] }

The result lists which columns differ — not which rows. Use deep diff to drill down.

Deep diff

The expensive one. Joins source and target on PKs and reports every cell-level difference plus rows missing on either side. Use sparingly — runs as a streaming Spark job for large tables.

POST /reconciliation/deep
{
  "source_fqn": "...",
  "target_fqn": "...",
  "primary_keys": ["id"],
  "max_diffs": 10000,
  "ignore_columns": ["updated_at"]
}

max_diffs caps output to keep memory bounded. ignore_columns is essential for ignoring fields that legitimately drift (timestamps, audit columns).

Run history

Lists past runs with mode, source/target, duration, mismatch counts, and download buttons for the diff CSV.

Scheduling

Schedule recurring recon via Recon Schedules. Common cadence: row-level hourly, column-level daily, deep-diff weekly.

API summary

POST /reconciliation/row-level
POST /reconciliation/column-level
POST /reconciliation/deep
GET  /reconciliation/runs
GET  /reconciliation/runs/{id}/diff   # download diff CSV

Diff & Compare — schema-level diff (no data)
Recon Schedules — recurring runs
DQ Suite Overview

Four modes​

Row-level​

Column-level​

Deep diff​

Run history​

Scheduling​

API summary​

Related​