Diff & Compare

Catalog diff

Docs: Information Schema

When to use: You want to quickly see what objects exist in the source but are missing in the destination (or vice versa) at the object level — schemas, tables, views, functions, volumes.

Real-world scenario: After a week of development, several new tables were added to production. You want to see exactly what's missing in staging before running a sync.

clxs diff --source production --dest staging

Output:

======================================================================
CATALOG DIFF: production vs staging
======================================================================

  SCHEMAS:
    Source: 12  |  Dest: 11  |  Common: 11
    Missing in destination (1):
      + analytics_v2

  TABLES:
    Source: 250  |  Dest: 247  |  Common: 247
    Missing in destination (3):
      + analytics_v2.daily_metrics
      + analytics_v2.weekly_rollup
      + sales.new_promotions

  VIEWS:
    Source: 15  |  Dest: 15  |  Common: 15
    In sync

  FUNCTIONS:
    In sync

  VOLUMES:
    In sync
======================================================================
Differences found: 4 missing in dest, 0 extra in dest

Diff output

clxs diff --source production --dest staging

Deep compare

Docs: DESCRIBE DETAIL | Information Schema COLUMNS

When to use: A shallow diff says both catalogs have the same 247 tables. But you suspect some table schemas might have diverged. Deep compare checks column definitions and row counts for every matching table.

clxs compare --source production --dest staging

Schema drift detection

Docs: Information Schema COLUMNS

When to use: After a production deployment added new columns to several tables, you want to check if your staging catalog is still schema-compatible before running integration tests.

clxs schema-drift --source production --dest staging

Output:

============================================================
SCHEMA DRIFT REPORT: production vs staging
============================================================
  Tables checked:    247
  Tables with drift: 3

  sales.orders:
    Columns in source only: ['discount_pct', 'promo_code']

  hr.employees:
    Column 'salary' modified: {'data_type': {'source': 'DECIMAL(12,2)', 'dest': 'DECIMAL(10,2)'}}

  analytics.daily_metrics:
    Column order differs
============================================================

Data profiling

Docs: Information Schema COLUMNS | Aggregate functions

When to use: You want to understand data quality across your catalog — null percentages, distinct counts, value ranges — either for the source catalog or to verify clone quality.

Real-world scenario: Before migrating to a new data platform, the data governance team needs a data quality report: which columns have high null rates, what are the value ranges, and how many distinct values exist per column.

# Profile entire catalog
clxs profile --source production

# Save results to JSON for further analysis
clxs profile --source production --output reports/prod_profile.json

Output:

============================================================
PROFILING SUMMARY: production
============================================================
  Tables profiled: 247
  Total rows:      15,432,891
  sales.orders: high null columns (>50%): discount_pct, promo_code
  hr.employees: high null columns (>50%): middle_name, termination_date
============================================================

What gets profiled per column

Column Type	Stats
All types	Null count, null %, distinct count
Numeric (INT, DOUBLE, DECIMAL...)	Min, max, avg
String	Min length, max length, avg length
Date/Timestamp	Min, max

Validation

Docs: Information Schema TABLES | Information Schema COLUMNS

When to use: After cloning, verify that the destination tables have the same data as the source — row counts and optionally data checksums.

Real-world scenario: You cloned production to staging for QA testing. Before the QA team starts, you need to verify every table has the correct row count. For critical tables, you also want hash-based checksum validation.

# Row count validation
clxs validate --source production --dest staging

# With checksum (slower but catches data corruption)
clxs validate --source production --dest staging --checksum

Output:

============================================================
VALIDATION SUMMARY: production vs staging
============================================================
  Total tables:  247
  Matched:       245
  Mismatched:    2
  Errors:        0
  Mismatched tables:
    sales.daily_agg: source=1043289 dest=1043201
    hr.payroll: source=15232 dest=15230
============================================================

Automated validation after clone

# Clone + auto-validate in one command
clxs clone --validate --checksum

Catalog diff​

Diff output​

Deep compare​

Schema drift detection​

Data profiling​

What gets profiled per column​

Validation​

Automated validation after clone​

Catalog diff

Diff output

Deep compare

Schema drift detection

Data profiling

What gets profiled per column

Validation

Automated validation after clone