Skip to main content

Diff & Compare

Catalog diff

Docs: Information Schema

When to use: You want to quickly see what objects exist in the source but are missing in the destination (or vice versa) at the object level — schemas, tables, views, functions, volumes.

Real-world scenario: After a week of development, several new tables were added to production. You want to see exactly what's missing in staging before running a sync.

clxs diff --source production --dest staging

Output:

======================================================================
CATALOG DIFF: production vs staging
======================================================================

SCHEMAS:
Source: 12 | Dest: 11 | Common: 11
Missing in destination (1):
+ analytics_v2

TABLES:
Source: 250 | Dest: 247 | Common: 247
Missing in destination (3):
+ analytics_v2.daily_metrics
+ analytics_v2.weekly_rollup
+ sales.new_promotions

VIEWS:
Source: 15 | Dest: 15 | Common: 15
In sync

FUNCTIONS:
In sync

VOLUMES:
In sync
======================================================================
Differences found: 4 missing in dest, 0 extra in dest

Diff output

clxs diff --source production --dest staging

Deep compare

Docs: DESCRIBE DETAIL | Information Schema COLUMNS

When to use: A shallow diff says both catalogs have the same 247 tables. But you suspect some table schemas might have diverged. Deep compare checks column definitions and row counts for every matching table.

clxs compare --source production --dest staging

Schema drift detection

Docs: Information Schema COLUMNS

When to use: After a production deployment added new columns to several tables, you want to check if your staging catalog is still schema-compatible before running integration tests.

clxs schema-drift --source production --dest staging

Output:

============================================================
SCHEMA DRIFT REPORT: production vs staging
============================================================
Tables checked: 247
Tables with drift: 3

sales.orders:
Columns in source only: ['discount_pct', 'promo_code']

hr.employees:
Column 'salary' modified: {'data_type': {'source': 'DECIMAL(12,2)', 'dest': 'DECIMAL(10,2)'}}

analytics.daily_metrics:
Column order differs
============================================================

Data profiling

Docs: Information Schema COLUMNS | Aggregate functions

When to use: You want to understand data quality across your catalog — null percentages, distinct counts, value ranges — either for the source catalog or to verify clone quality.

Real-world scenario: Before migrating to a new data platform, the data governance team needs a data quality report: which columns have high null rates, what are the value ranges, and how many distinct values exist per column.

# Profile entire catalog
clxs profile --source production

# Save results to JSON for further analysis
clxs profile --source production --output reports/prod_profile.json

Output:

============================================================
PROFILING SUMMARY: production
============================================================
Tables profiled: 247
Total rows: 15,432,891
sales.orders: high null columns (>50%): discount_pct, promo_code
hr.employees: high null columns (>50%): middle_name, termination_date
============================================================

What gets profiled per column

Column TypeStats
All typesNull count, null %, distinct count
Numeric (INT, DOUBLE, DECIMAL...)Min, max, avg
StringMin length, max length, avg length
Date/TimestampMin, max

Validation

Docs: Information Schema TABLES | Information Schema COLUMNS

When to use: After cloning, verify that the destination tables have the same data as the source — row counts and optionally data checksums.

Real-world scenario: You cloned production to staging for QA testing. Before the QA team starts, you need to verify every table has the correct row count. For critical tables, you also want hash-based checksum validation.

# Row count validation
clxs validate --source production --dest staging

# With checksum (slower but catches data corruption)
clxs validate --source production --dest staging --checksum

Output:

============================================================
VALIDATION SUMMARY: production vs staging
============================================================
Total tables: 247
Matched: 245
Mismatched: 2
Errors: 0
Mismatched tables:
sales.daily_agg: source=1043289 dest=1043201
hr.payroll: source=15232 dest=15230
============================================================

Automated validation after clone

# Clone + auto-validate in one command
clxs clone --validate --checksum