Validation & Preflight

Pre-flight checks

Docs: SQL Warehouses | Databricks SDK for Python

When to use: Before starting a long-running clone, verify that everything is in place: workspace connectivity, warehouse is running, catalogs are accessible, and you have write permissions.

Real-world scenario: Your clone job runs at 2 AM via a scheduled workflow. Instead of failing 30 minutes in because the warehouse was stopped, the pre-flight check catches it immediately and fails fast.

# Run all checks
clxs preflight

Output:

============================================================
PRE-FLIGHT CHECK RESULTS
============================================================
  [PASS] env_vars: DATABRICKS_HOST and TOKEN set
  [PASS] connectivity: 12 catalogs accessible
  [PASS] warehouse: my-warehouse (RUNNING)
  [PASS] source_access (production): Accessible (1 schema(s) readable)
  [PASS] destination_access (staging): Accessible (1 schema(s) readable)
  [PASS] write_permissions: Can create/drop schemas
------------------------------------------------------------
  6 passed, 0 warnings, 0 failed
============================================================
All critical checks passed. Ready to proceed.

tip

If the destination catalog doesn't exist yet, the preflight will show a warning (not a failure) — the clone command will create it automatically.

# Skip the write permission check (e.g., for read-only analysis commands)
clxs preflight --no-write-check

Automate it

Add pre-flight as a step before clone in your pipeline:

clxs preflight && clxs clone

Post-clone validation

Docs: Information Schema TABLES

When to use: After cloning, verify that the destination tables have the same data as the source — row counts and optionally data checksums.

Real-world scenario: You cloned production to staging for QA testing. Before the QA team starts, you need to verify every table has the correct row count.

# Row count validation
clxs validate --source production --dest staging

# With checksum (slower but catches data corruption)
clxs validate --source production --dest staging --checksum

Output:

============================================================
VALIDATION SUMMARY: production vs staging
============================================================
  Total tables:  247
  Matched:       245
  Mismatched:    2
  Errors:        0
  Mismatched tables:
    sales.daily_agg: source=1043289 dest=1043201
    hr.payroll: source=15232 dest=15230
============================================================

Automated validation after clone

# Clone + auto-validate in one command
clxs clone --validate --checksum

Cost estimation

Docs: DESCRIBE DETAIL

When to use: Before running a deep clone, estimate how much additional storage it will cost so you can get budget approval or choose shallow clone instead.

Real-world scenario: Your finance team asks: "How much will it cost to maintain a deep clone of the production catalog?" You run the estimator to get a dollar figure.

# Default pricing ($0.023/GB/month — AWS S3 standard)
clxs estimate --source production

# Custom pricing
clxs estimate --source production --price-per-gb 0.03

Output:

============================================================
COST ESTIMATION: production (DEEP CLONE)
============================================================
  Total tables:    247
  Total size:      1.84 TB
  Estimated monthly storage cost: $43.38/month
  Estimated annual cost:          $520.56/year
============================================================

Pre-flight checks​

Automate it​

Post-clone validation​

Automated validation after clone​

Cost estimation​

Pre-flight checks

Automate it

Post-clone validation

Automated validation after clone

Cost estimation