Skip to main content

Use Cases

A catalog of common scenarios mapped to the right combination of Clone-Xs features. Start here if you know what you want to do but aren't sure which flags, flows, or guides apply.

Each use case lists:

  • The problem — what situation you're in
  • Recommended approach — which features to combine
  • Starting point — a minimal config or command to adapt

Disaster recovery

DR replica across regions (same metastore)

Problem: Your production catalog lives in eastus and you need a hot standby in westus that stays close to real-time. Both workspaces share the same Unity Catalog metastore.

Approach: Same-workspace DEEP CLONE for the initial snapshot, then Incremental Sync on a cron for ongoing catch-up. Delta time-travel on the destination gives you point-in-time recovery.

source_catalog: "prod"
destination_catalog: "prod_dr"
clone_type: "DEEP"
enable_rollback: true
validate_after_clone: true

Then schedule clxs incremental-sync --source prod --dest prod_dr every 15 minutes.

Cross-cloud DR (AWS → Azure)

Problem: Compliance requires your DR site on a different cloud from production. The two workspaces are on different Unity Catalog metastores.

Approach: Cross-workspace migration via Delta Sharing + DEEP CLONE. Source creates a share, target consumes it and materializes data into target cloud storage. Run on a schedule for a weekly warm replica.

{
"source_catalog": "retail_prod",
"destination_catalog": "retail_prod_dr",
"target_workspace": {
"host": "https://adb-target.azuredatabricks.net",
"auth_method": "pat",
"token": "dapi...",
"warehouse_id": "abc123"
}
}

Development & testing

Dev sandbox from production snapshot

Problem: Developers need a production-like catalog to experiment in, but you don't want to pay storage for a full copy and you don't need persistence beyond a sprint.

Approach: SHALLOW CLONE + TTL. Metadata-only pointer (near-zero storage cost), auto-expires after N days.

clxs clone \
--source production \
--dest dev_sandbox \
--clone-type SHALLOW \
--ttl 14d

CI test catalog per pull request

Problem: Each PR spins up an isolated test catalog for integration tests, then tears it down. Keeping a SQL warehouse running 24/7 for this is wasteful.

Approach: Serverless clone + SHALLOW + auto-rollback. No warehouse needed; the serverless job starts on demand. Pair with schema_only: true if the tests don't need data.

clxs clone \
--source production \
--dest pr_${PR_NUMBER} \
--serverless \
--volume /Volumes/ops/libs/clone_xs \
--clone-type SHALLOW \
--ttl 3d

Schema migration dry run

Problem: You're about to rename columns and refactor tables in production. You want to validate the migration against real structure before touching production.

Approach: schema_only: true to create the destination catalog + schemas + empty tables (no data), run the migration scripts against it, diff the result.

source_catalog: "production"
destination_catalog: "migration_test"
schema_only: true
copy_constraints: true
copy_comments: true

QA environment for destructive tests

Problem: Your QA team runs destructive stress tests (deletes, updates, schema drops). They can't risk touching production files even through a shallow clone.

Approach: DEEP clone — fully independent data. Optional validate_after_clone: true to confirm parity before QA starts.

source_catalog: "production"
destination_catalog: "qa_full"
clone_type: "DEEP"
validate_after_clone: true
generate_report: true

Compliance & governance

PII-masked dev copy

Problem: Developers need realistic production data but compliance requires PII masking. Raw prod data can't leave the locked-down workspace.

Approach: Data masking during clone — PII columns pass through a masking function before write. Pair with Scope Picker to only copy the schemas developers need.

source_catalog: "prod"
destination_catalog: "dev_masked"
clone_type: "DEEP"
include_schemas: ["orders", "customers"]
masking:
"customers.email": "hash_email"
"customers.phone": "redact_phone"

Audit-ready snapshot

Problem: Regulator asks for the exact state of a catalog at month-end. You need a tamper-resistant frozen copy with a full manifest.

Approach: DEEP clone + generate_report: true + TTL set to the retention period. The report is the audit manifest (what was copied, counts, checksums, timestamps).

source_catalog: "sales"
destination_catalog: "sales_2026_01_close"
clone_type: "DEEP"
as_of_timestamp: "2026-01-31T23:59:59"
validate_after_clone: true
validate_checksum: true
generate_report: true
ttl: "2555d" # 7 years

GDPR right-to-be-forgotten verification

Problem: A customer files a GDPR erasure request. Before running the delete, you want an isolated verification copy of their data to confirm the delete was complete.

Approach: Scope Picker to pick only the tables with the customer's data + RTBF workflow on the copy.


Data migration

Cross-cloud workload migration

Problem: Business decision to move production workloads from AWS to Azure. Every catalog, schema, table, view, function, volume, and permission has to land on the new side.

Approach: Cross-workspace migration — full-fidelity Delta Sharing → DEEP CLONE pipeline handles all object types + metadata replay. Run per catalog; validate; cut over DNS / clients.

Workspace consolidation

Problem: Merger brought 5 team workspaces under one org. You want all their catalogs under one consolidated workspace.

Approach: Five cross-workspace migrations, one per source → same target workspace. Namespace collisions handled by renaming destination_catalog per job.

for src in team_a team_b team_c team_d team_e; do
clxs clone \
--source "$src" \
--dest "${src}_migrated" \
--target-host https://adb-target.azuredatabricks.net \
--target-token "$DAPI" \
--target-warehouse "$WH_ID"
done

Pre-deletion audit copy

Problem: About to drop a 5-year-old catalog that hasn't been touched in months. Legal wants a frozen copy retained before the drop.

Approach: DEEP CLONE to a legal_hold_* catalog with long TTL + auto_rollback_on_failure: false (don't revert if validation trips — keep whatever landed).


Ongoing operations

Weekly staging refresh

Problem: Staging drifts from prod as developers create throwaway tables and leave them. Every Monday you want staging to match prod structurally.

Approach: Two-way Sync with --drop-extra. Reconciles structure, removes tables that no longer exist in source.

clxs sync --source production --dest staging --drop-extra

Schedule via Scheduling or a Databricks workflow.

Near-real-time prod → staging sync

Problem: You want staging to trail production by minutes, not days — but full re-clones take hours.

Approach: Enable Change Data Feed on source tables, run Incremental Sync in CDF mode every 5 minutes. Only the changed rows flow through.

# One-time: enable CDF on source tables
ALTER TABLE prod.orders SET TBLPROPERTIES (delta.enableChangeDataFeed = true);

# Then cron: clxs incremental-sync --source prod --dest staging --sync-mode cdf

Roll back a bad clone

Problem: A clone job ran but the destination looks wrong. You want to undo the clone non-destructively.

Approach: Rollback using the rollback log. Uses Delta RESTORE TABLE to put pre-existing tables back to their pre-clone version; drops tables that were newly created.

clxs rollback --log-file rollback_logs/rollback_staging_20260419_103000.json

Auto-recover from failed validation

Problem: You run scheduled clones unattended overnight. If a clone completes but validation finds > 5% row-count drift, you want to automatically undo it rather than leave a half-broken destination.

Approach: Auto-rollback on fail with a threshold.

validate_after_clone: true
auto_rollback_on_failure: true
rollback_threshold: 5.0

Delta Live Tables

Clone a DLT pipeline to a DR workspace

Problem: Your production DLT pipeline runs in one workspace. You need a DR copy ready to switch on in another workspace.

Approach: DLT Pipeline clone (cross-workspace). Copies pipeline definition + libraries; lands in development mode for safe review.

Duplicate a pipeline for A/B testing

Problem: You want to test a logic change in your Silver DLT pipeline without touching the live one.

Approach: Same-workspace DLT clone with a new name, modify the cloned libraries, run side-by-side against the same Bronze source.


Advanced scenarios

Time-travel clone from a known-good state

Problem: Bad data landed in production at 3 AM; you need a clean snapshot from 2:59 AM.

Approach: Time-travel clone — DEEP CLONE with as_of_timestamp set to before the bad data landed.

clone_type: "DEEP"
as_of_timestamp: "2026-04-19T02:59:00"

Partial-region copy (only 2024 data)

Problem: Analytics team only needs the 2024 partition for their model; the full history is 8 TB.

Approach: WHERE-clause clone (DEEP only — loses Delta history).

clone_type: "DEEP"
where_clauses:
"*": "year = 2024"
"orders.events": "year = 2024 AND region = 'US'"

Clone only selected schemas/objects

Problem: Engineering team owns 3 of the 30 schemas in prod. They want a dev copy with just their 3 schemas.

Approach: Scope Picker (UI) or include_schemas + include_objects (API/YAML) for granular selection.

source_catalog: "prod"
destination_catalog: "prod_dev"
include_objects:
- { schema: "orders", name: "line_items", type: "table" }
- { schema: "orders", name: "customers", type: "table" }
- { schema: "marketing", name: "v_campaigns", type: "view" }

Choosing the right clone type — quick reference

If you want…Use
Full independent copy (destructive-safe)DEEP clone
Fast metadata-only referenceSHALLOW clone
Copy only changed tables since last runIncremental sync (version mode)
Copy only changed rows since last runIncremental sync (CDF mode — requires PK + CDF on source)
Keep dest structure aligned with sourceTwo-way Sync
Move catalog across workspaces / cloudsCross-workspace migration
Spin up a catalog without a SQL warehouseServerless clone
Undo a previous cloneRollback
Copy a DLT pipeline (definition only)DLT pipeline clone
Restore at an earlier point in timeTime-travel clone (as_of_timestamp / as_of_version)
Subset of data by predicateWHERE-clause clone (DEEP only)
Subset of schemas / specific objectsScope Picker + include_objects

See configuration reference for every config field that appears in these scenarios.