Skip to main content

CLI Reference

Global flags

These flags are available on all subcommands:

FlagDescription
-c, --configPath to config YAML (default: config/clone_config.yaml)
--warehouse-idSQL warehouse ID (overrides config)
-v, --verboseEnable debug logging
--profileConfig profile to use
--log-fileWrite logs to file
--hostDatabricks workspace URL
--tokenDatabricks personal access token
--auth-profileDatabricks CLI profile from ~/.databrickscfg
--verify-authVerify authentication before running
--loginInteractive browser login

--catalog alias

The following single-catalog commands accept --catalog as an alias for --source:

stats, storage-metrics, optimize, vacuum, profile, export, search, snapshot, estimate, cost-estimate, dep-graph, usage-analysis, sample, view-deps, pii-scan, state

For example, clxs stats --catalog edp_dev is equivalent to clxs stats --source edp_dev.


Commands

clone

Clone an entire Unity Catalog catalog.

clxs clone [options]
FlagDescription
--sourceSource catalog name
--destDestination catalog name
--clone-typeDEEP or SHALLOW (default: DEEP)
--load-typeFULL or INCREMENTAL (default: FULL)
--include-schemasComma-separated list of schemas to include
--include-tables-regexRegex pattern for table inclusion
--exclude-tables-regexRegex pattern for table exclusion
--as-of-timestampTime travel (version or timestamp)
--max-workersParallel workers (default: 4)
--dry-runPreview SQL without executing
--validateValidate after cloning
--enable-rollbackSave rollback log
--reportGenerate summary report
--progressShow progress bar
--no-permissionsSkip copying grants and access controls (copied by default)
--no-tagsSkip copying tags (copied by default)
--no-ownershipSkip copying ownership (copied by default)
--locationManaged storage location for new catalog
--dest-hostDestination workspace URL (cross-workspace)
--dest-tokenDestination workspace token (cross-workspace)
--templateApply a clone template (e.g., dev-refresh, dr-replica)
--whereGlobal WHERE filter for all tables (deep clone only)
--table-filterPer-table WHERE filter: schema.table:condition (repeatable)
--auto-rollbackAuto-rollback if validation fails
--rollback-thresholdMax mismatch % before rollback (default: 5)
--throttleThrottle profile: low, medium, high, max
--checkpointEnable periodic checkpointing
--resume-from-checkpointResume from checkpoint file
--require-approvalRequire approval before cloning
--impact-checkRun impact analysis before cloning
--ttlSet TTL on destination (e.g., 7d, 30d)
--skip-unusedSkip tables with no recent queries
--schema-onlyCreate empty tables (structure only, no data) with all other artifacts
--target-formatDestination format — DELTA (default) or ICEBERG (UniForm-readable). See clone guide — target format.
--iceberg-physicalWith --target-format ICEBERG, swap UniForm for CREATE TABLE … USING iceberg AS SELECT so UC reports the destination as Data source: Iceberg. Loses Delta history.
--auto-mask-piiAuto-detect PII columns via UC column_tags and mask post-clone via the existing src/masking.py pipeline.
--enable-retry / --no-retryAuto-retry transient failures (network, throttle, 5xx). Default on.
--compare-dq-after-cloneRun column-level DQ comparison after each schema clones; combined with --auto-rollback triggers RESTORE if drift > --dq-drift-rollback-pct.
--dq-drift-rollback-pctDrift threshold percent for the above (default 5.0).
--quiesce-sourcePre-clone source quiesce — snapshot + revoke writes for the duration of the clone, restore in finally. Prevents partial-time-travel divergence under live writers.
--as-of-versionClone tables as of this Delta version (alternative to --as-of-timestamp).
--checksumUse checksum validation in addition to row counts (slower, catches silent drift).
--max-rpsMax SQL requests per second across all workers (rate-limit). 0 = unlimited.
--parallel-tablesTables cloned in parallel within one schema (default 1).
--order-by-sizeasc (smallest first — small tables finish early) or desc (biggest first — fail fast on storage issues).
--no-commentsSkip copying table/column comments.
--no-constraintsSkip copying CHECK / NOT NULL constraints.
--no-propertiesSkip copying Delta table properties.
--no-securitySkip copying row filters and column masks.
--no-progressDisable progress bar (suppress for log-aggregator runs).
--dest-warehouse-idDestination SQL warehouse ID for cross-workspace clones.
--resumeResume from a previous rollback log file (e.g. rollback_logs/rollback_20260501.json).

diff

Compare structure of two catalogs.

clxs diff --source <catalog> --dest <catalog> [options]
FlagDescription
--sourceFirst catalog
--destSecond catalog
--formatOutput format: text, json, csv

compare

Deep compare with row counts and checksums.

clxs compare --source <catalog> --dest <catalog> [options]
FlagDescription
--sourceFirst catalog
--destSecond catalog
--include-schemasLimit to specific schemas

validate

Post-clone validation.

clxs validate --source <catalog> --dest <catalog> [options]

preflight

Pre-flight checks before cloning.

clxs preflight --source <catalog> --dest <catalog> [options]

sync

Two-way sync between catalogs.

clxs sync --source <catalog> --dest <catalog> [options]
FlagDescription
--dry-runPreview changes

rollback

Undo a clone operation.

clxs rollback --rollback-log <file> [options]
FlagDescription
--rollback-logPath to rollback log file
--dry-runPreview what would be dropped

schema-drift

Detect schema changes over time.

clxs schema-drift --source <catalog> [options]
FlagDescription
--sourceCatalog to analyze
--include-schemasComma-separated list of schemas to include

stats

Catalog statistics and inventory.

clxs stats --source <catalog> [options]
clxs stats --catalog <catalog> [options]

Search catalog metadata.

clxs search --source <catalog> --pattern <regex> [options]
clxs search --catalog <catalog> --pattern <regex> [options]

profile

Column-level data profiling.

clxs profile --source <catalog> [options]
clxs profile --catalog <catalog> [options]
FlagDescription
--source, --catalogCatalog to profile
--include-schemasComma-separated list of schemas to include

monitor

Continuous monitoring.

clxs monitor --source <catalog> --interval <seconds> [options]

export

Export metadata to CSV or JSON.

clxs export --source <catalog> --format <csv|json> --output <file> [options]
clxs export --catalog <catalog> --format <csv|json> --output <file> [options]

snapshot

Point-in-time catalog snapshot.

clxs snapshot --source <catalog> [options]
clxs snapshot --catalog <catalog> [options]

estimate

Cost estimation for clone operations.

clxs estimate --source <catalog> [options]
clxs estimate --catalog <catalog> [options]

generate-workflow

Generate Databricks Workflow JSON.

clxs generate-workflow [options]
FlagDescription
--scheduleCron expression
--outputOutput file path

export-iac

Export catalog as Terraform or Pulumi.

clxs export-iac --source <catalog> --format <terraform|pulumi> --output <file>

config-diff

Compare two config files.

clxs config-diff <file_a> <file_b>

init

Create default config file.

clxs init

auth

Authentication management.

clxs auth [options]
FlagDescription
--loginInteractive browser login
--list-profilesList configured CLI profiles
(default)Show current auth status

completion

Generate shell completions.

clxs completion <bash|zsh|fish>

run-sql

Execute arbitrary SQL.

clxs run-sql --warehouse-id <id> --sql "<statement>"

plan

Generate an execution plan (enhanced dry-run) showing all SQL that would be executed, with cost estimates.

clxs plan --source <catalog> --dest <catalog> [options]
FlagDescription
--sourceSource catalog name
--destDestination catalog name
--formatOutput format: console, json, html
--outputWrite plan to file

lint

Validate and lint the configuration file.

clxs lint [options]
FlagDescription
-c, --configConfig file to lint
--profileConfig profile to lint
--strictTreat warnings as errors (exit code 1)

usage-analysis

Analyze table access patterns to find unused tables.

clxs usage-analysis --source <catalog> [options]
clxs usage-analysis --catalog <catalog> [options]
FlagDescription
--source, --catalogCatalog to analyze
--daysLookback period in days (default: 90)
--unused-daysThreshold for "unused" (default: 30)
--recommendShow tables recommended to skip
--outputExport analysis to JSON file

preview

Side-by-side data preview comparing source and destination tables.

clxs preview --source <catalog> --dest <catalog> [options]
FlagDescription
--sourceSource catalog
--destDestination catalog
--tableSpecific table (schema.table)
--allPreview all tables
--limitRows per table (default: 10)
--order-byColumn for deterministic ordering
--max-tablesMax tables for --all (default: 20)

metrics

View clone operation metrics and performance history.

clxs metrics [options]
FlagDescription
--initCreate the metrics Delta table
--sourceFilter by source catalog
--limitMax results (default: 50)
--formatOutput: console or json (JSON outputs machine-readable format)

history

Git-style clone operation history.

clxs history <list|show|diff> [options]
FlagDescription
listList recent clone operations
show <id>Show details of a specific operation
diff <id1> <id2>Compare two operations
--sourceFilter by source catalog
--limitMax results (default: 20)

ttl

Manage data retention (TTL) policies on cloned catalogs.

clxs ttl <set|check|cleanup|extend|remove> [options]
FlagDescription
setSet TTL on a catalog
checkList all TTL policies
cleanupDrop expired catalogs
extendExtend TTL by additional days
removeRemove TTL policy
--destTarget catalog
--daysTTL in days
--confirmConfirm destructive cleanup

rbac

RBAC policy management.

clxs rbac <check|show> [options]
FlagDescription
checkCheck current user's permissions
showDisplay the loaded RBAC policy
--userCheck permissions for a specific user

approval

Clone approval workflow management.

clxs approval <list|approve|deny|status> [request-id]
FlagDescription
listList pending approval requests
approve <id>Approve a request
deny <id>Deny a request
status <id>Check request status
--userUser performing approval
--reasonReason for denial

impact

Analyze downstream impact before cloning.

clxs impact --source <catalog> [options]
FlagDescription
--sourceCatalog to analyze
--destDestination catalog
--thresholdNumber of downstream dependents to qualify as high-impact (default: 10)

compliance-report

Generate audit-ready compliance reports.

clxs compliance-report [options]
FlagDescription
--fromStart date (YYYY-MM-DD)
--toEnd date (YYYY-MM-DD)
--formatOutput: json, html, or all
--output-dirOutput directory (default: reports/compliance)

plugin

Manage Clone-Xs plugins: list registered plugins, enable or disable them.

clxs plugin <list|enable|disable> [name]
FlagDescription
listList all registered plugins and their enabled/disabled status
enable <name>Enable a plugin by name
disable <name>Disable a plugin by name

Examples:

# List all plugins
clxs plugin list

# Enable the optimize plugin
clxs plugin enable optimize

# Disable the slack-notify plugin
clxs plugin disable slack-notify

Plugins are loaded from paths specified in your config file under plugins. State is persisted to ~/.clone-xs/plugin_state.json.

See the Plugins Guide for writing custom plugins.


schedule

Run clones on a recurring schedule with drift detection.

clxs schedule --source <catalog> --dest <catalog> [options]
FlagDescription
--intervalRun interval (e.g., 30m, 1h, 6h)
--cronCron expression (e.g., 0 */6 * * *)
--no-drift-checkSkip drift detection
--max-runsStop after N runs (0 = unlimited)

serve

Start a REST API server for clone operations.

clxs serve [options]
FlagDescription
--portServer port (default: 8080)
--host-addrBind address (default: 0.0.0.0)
--api-keyAPI key for authentication

incremental-sync

Sync only changed tables using Delta table version history.

clxs incremental-sync [options]
FlagDescription
--sourceSource catalog name
--destDestination catalog name
--schemaSpecific schema to sync
--clone-typeDEEP or SHALLOW
--dry-runPreview without executing

sample

Preview or compare table data samples.

clxs sample --schema S --table T [options]
FlagDescription
--source, --catalogSource catalog
--destDestination catalog (enables compare mode)
--schemaSchema name (required)
--tableTable name (required)
--limitNumber of rows (default: 10)

view-deps

Analyze view and function dependencies with creation order.

clxs view-deps --schema S [options]
FlagDescription
--source, --catalogCatalog name
--schemaSchema to analyze (required)
--outputExport dependency graph to JSON

slack-bot

Start a Slack bot for clone operations via Socket Mode.

clxs slack-bot [options]
FlagDescription
-c, --configConfig file path

Requires environment variables: SLACK_BOT_TOKEN and SLACK_APP_TOKEN.


storage-metrics

Analyze per-table storage breakdown using ANALYZE TABLE ... COMPUTE STORAGE METRICS.

clxs storage-metrics --source <catalog> [options]
clxs storage-metrics --catalog <catalog> [options]
FlagDescription
--source, --catalogCatalog to analyze
--schemaFilter to specific schema
--tableFilter to specific table
--formatOutput format: console, json, csv
--include-schemasComma-separated list of schemas to include

optimize

Run OPTIMIZE on tables to compact small files.

clxs optimize --source <catalog> [options]
clxs optimize --catalog <catalog> [options]
FlagDescription
--source, --catalogCatalog name
--schemaSpecific schema
--tableSpecific table
--dry-runPreview without executing

vacuum

Run VACUUM on tables to remove old files beyond retention period.

clxs vacuum --source <catalog> [options]
clxs vacuum --catalog <catalog> [options]
FlagDescription
--source, --catalogCatalog name
--schemaSpecific schema
--tableSpecific table
--retention-hoursRetention period in hours (default: 168 / 7 days)
--dry-runPreview files that would be deleted

create-job

Create a persistent Databricks Job that runs Clone-Xs on a schedule.

clxs create-job --source <catalog> --dest <catalog> [options]
FlagDescription
--sourceSource catalog name
--destDestination catalog name
--volumeUC Volume path for wheel upload
--job-nameCustom job name (default: Clone-Xs: source -> dest)
--scheduleQuartz cron expression
--timezoneSchedule timezone (default: UTC)
--notification-emailComma-separated email addresses
--max-retriesRetries on failure (default: 0)
--timeoutJob timeout in seconds (default: 7200)
--tagJob tag as key=value (repeatable)
--update-job-idUpdate existing job instead of creating new
--run-nowRun the job immediately after creation

pii-scan

Scan a catalog for personally identifiable information.

clxs pii-scan --source <catalog> [options]
clxs pii-scan --catalog <catalog> [options]
FlagDescription
--source, --catalogSource catalog name
--sample-dataEnable data value sampling
--read-uc-tagsRead UC column tags for detection
--save-historySave results to Delta tables
--apply-tagsApply PII tags to UC columns after scan
--tag-prefixPrefix for UC tags (default: pii)
--schema-filterFilter to specific schemas
--table-filterRegex filter on table names
--no-exit-codeDon't exit with code 1 if PII found

state

Show current clone state between source and destination catalogs.

clxs state --source <catalog> --dest <catalog> [options]
clxs state --catalog <catalog> --dest <catalog> [options]
FlagDescription
--source, --catalogSource catalog name
--destDestination catalog name

cost-estimate

Estimate storage and compute costs for a catalog.

clxs cost-estimate --source <catalog> [options]
clxs cost-estimate --catalog <catalog> [options]
FlagDescription
--source, --catalogCatalog to estimate

dep-graph

Generate a dependency graph for catalog objects.

clxs dep-graph --source <catalog> [options]
clxs dep-graph --catalog <catalog> [options]
FlagDescription
--source, --catalogCatalog to analyze

rtbf

Right to Be Forgotten (GDPR Article 17) — manage erasure requests.

clxs rtbf <action> [options]

Actions:

ActionDescription
submitSubmit a new erasure request
discoverDiscover subject data across all catalogs
impactShow impact analysis for a request
approveApprove a request for execution
executeExecute deletion/anonymization
vacuumVACUUM affected tables (physical deletion)
verifyVerify deletion completeness
certificateGenerate deletion certificate
listList RTBF requests
statusGet request status and details
cancelCancel a request
overdueShow overdue requests

Submit flags:

FlagRequiredDescription
--subject-typeYesemail, customer_id, ssn, phone, name, national_id, passport, credit_card, custom
--subject-valueYesThe identifier value to erase
--requester-emailYesRequester's email
--requester-nameYesRequester's name
--legal-basisNoLegal basis (default: GDPR Art. 17(1)(a))
--strategyNodelete, anonymize, pseudonymize (default: delete)
--scope-catalogsNoLimit search to specific catalogs
--grace-period-daysNoDays to wait before execution (default: 0)
--subject-columnNoCustom column name (required for custom type)

Example workflow:

clxs rtbf submit --subject-type email --subject-value "user@example.com" \
--requester-email "dpo@corp.com" --requester-name "DPO"
clxs rtbf discover --request-id <ID> --subject-value "user@example.com"
clxs rtbf approve --request-id <ID>
clxs rtbf execute --request-id <ID> --subject-value "user@example.com"
clxs rtbf vacuum --request-id <ID>
clxs rtbf verify --request-id <ID> --subject-value "user@example.com"
clxs rtbf certificate --request-id <ID>

dsar

Data Subject Access Request (GDPR Article 15) — find and export subject data.

API / UI only — no CLI subcommand

DSAR has no clxs dsar parser today. Use POST /api/dsar/* (see api.md — DSAR) or the /dsar page in the web UI.

# Not implemented as a CLI command — example shape only:
clxs dsar <action> [options]
ActionDescription
submitSubmit a new access request
discoverDiscover subject data across catalogs
approveApprove a request for export
exportExport subject data to CSV/JSON/Parquet
reportGenerate access report
deliverMark report as delivered
listList DSAR requests
statusGet request status
cancelCancel a request
overdueShow overdue requests

pipeline

Clone Pipelines — chain multiple operations into automated workflows.

API / UI only — no CLI subcommand

Pipelines have no clxs pipeline parser today. Use POST /api/pipeline/* (see api.md — Pipeline) or the Pipelines page in the web UI.

# Not implemented as a CLI command — example shape only:
clxs pipeline <action> [options]
ActionDescription
createCreate a new pipeline
runRun a pipeline
listList pipelines
statusGet run status
deleteDelete a pipeline
templatesList built-in templates
cancelCancel a running pipeline

observability

Data Observability — unified health scoring dashboard.

API / UI only — no CLI subcommand

Observability has no clxs observability parser today. Use GET /api/observability/* (see api.md — Observability) or the Observability page in the web UI.

# Not implemented as a CLI command — example shape only:
clxs observability <action>
ActionDescription
dashboardShow full observability dashboard
healthShow health score (0-100)
issuesList top issues
trendsShow metric trends

dashboard

Launch the Streamlit web dashboard locally — quick alternative to running the full FastAPI + React stack when all you need is a read-only view of audit / metrics / reports.

clxs dashboard --port 8501
FlagDefaultDescription
-c, --configconfig/clone_config.yamlConfig file path
--port8501Dashboard port

audit

Query the clone audit trail Delta table — every operation is logged here for compliance reporting.

clxs audit --init                         # Initialize audit table
clxs audit --source prod --limit 10 # Filter by source catalog
clxs audit --status failed # Filter by status
FlagDefaultDescription
--initfalseCreate the audit table if it doesn't exist
--source(none)Filter by source catalog name
--status(none)Filter by status (completed, failed, running)
--limit20Max results to return

lineage

Query clone lineage — track which destination came from which source operation.

clxs lineage --init                       # Initialize lineage table
clxs lineage --table prod_clone.bronze.events
clxs lineage --operation-id abc123
FlagDefaultDescription
--initfalseCreate the lineage table
--table(none)Filter by destination table FQN
--operation-id(none)Filter by clone operation UUID
--limit50Max results

policy-check

Check clone policies (guardrails) defined in a YAML file before executing a clone — fail-fast for forbidden source/destination combinations, missing tags, etc.

clxs policy-check --policy-file config/clone_policies.yaml
FlagDefaultDescription
--policy-file(none)Path to policies YAML

schema-evolve

Detect schema drift between source and destination, optionally applying ALTER TABLE statements to converge.

clxs schema-evolve --source prod --dest prod_clone --dry-run
clxs schema-evolve --source prod --dest prod_clone --drop-removed
FlagDefaultDescription
--source(config)Override source catalog name
--dest(config)Override destination catalog name
--dry-runfalsePreview changes without applying
--drop-removedfalseDrop columns removed from source (destructive)

tui

Launch the interactive terminal UI — a curses-style alternative to the web wizard for users who prefer to drive Clone-Xs from a single SSH session.

clxs tui -c config/clone_config.yaml
FlagDefaultDescription
-c, --configconfig/clone_config.yamlConfig file path

templates

List built-in clone templates or export one as a YAML config you can edit.

clxs templates                                  # List
clxs templates --export dev-copy --output dev.yaml
FlagDefaultDescription
--export(none)Template name to export (e.g. dev-copy, dr-backup)
--outputstdoutOutput file path
-v, --verbosefalseVerbose output
--log-file(none)Log file path

workspaces

List multi-cloud workspaces configured in clone_config.yaml's workspaces block — quick discovery of which target hosts the CLI knows about.

clxs workspaces

Uses standard global flags (--config, --profile, --verbose).


generate-dab

Generate a Databricks Asset Bundle (DAB) for clone jobs — emits the databricks.yml and resource definitions you can databricks bundle deploy for workspace-side scheduled execution.

clxs generate-dab --source prod --dest prod_clone --output dab_bundle/ \
--job-name nightly-prod-clone --schedule "0 0 2 * * ?" \
--notification-email oncall@example.com
FlagDefaultDescription
--source(config)Override source catalog name
--dest(config)Override destination catalog name
--outputdab_bundleOutput directory
--job-name(auto)Job name
--schedule(none)Quartz cron expression for the bundle's schedule
--notification-email(none)Email for job notifications

multi-clone

Clone to multiple destination workspaces in parallel — reads a YAML file describing each destination (host / token / catalog name) and fans out clones across them.

clxs multi-clone --source prod --destinations destinations.yaml --max-parallel 3
FlagDefaultDescription
--source(config)Override source catalog name
--destinationsrequiredYAML file with destination workspace configs
--max-parallel2Max parallel workspace clones

cost-estimate

Estimate clone cost before running — combines source storage size with the chosen warehouse type to predict DBU + storage spend.

clxs cost-estimate --source prod --clone-type DEEP --warehouse-type serverless
FlagDefaultDescription
--source, --catalog(config)Override source catalog name
--clone-type(none)DEEP or SHALLOW for the estimate
--warehouse-typeserverlessserverless or classic

pii-scan

Scan a catalog for PII columns — heuristic + UC-tag-based detection, with optional sample-data inspection and tag-application.

clxs pii-scan --source prod --schema-filter bronze silver \
--table-filter "customer|user" --sample-data --apply-tags --tag-prefix pii
FlagDefaultDescription
--source, --catalog(config)Override source catalog name
--schema-filter(none)Only scan these schemas (space-separated)
--table-filter(none)Regex filter on table names
--sample-datafalseSample actual values (slower, more accurate)
--no-exit-codefalseDon't exit non-zero if PII is found
--read-uc-tagsfalseRead UC column tags to enhance detection
--save-historyfalseSave scan results to Delta tables
--apply-tagsfalseApply PII tags to UC after scan
--tag-prefixpiiPrefix for UC tags

dep-graph

Build and display a table/view dependency graph for a catalog — input for clone-ordering decisions and impact analysis.

clxs dep-graph --source prod --output graph.json
FlagDefaultDescription
--source, --catalog(config)Override source catalog name
--output(stdout)Export graph to JSON file

distributed-clone

Generate a Spark-based distributed-clone notebook — for catalogs too large for a single warehouse to clone within reasonable wall-clock.

clxs distributed-clone --source prod --dest prod_clone --output notebooks/distributed_clone.py
FlagDefaultDescription
--source(config)Override source catalog name
--dest(config)Override destination catalog name
--outputnotebooks/distributed_clone.pyOutput notebook path

warehouse

Manage the SQL warehouse used for clone operations — start a stopped warehouse, scale it up/down, or check its current state.

clxs warehouse status
clxs warehouse start
clxs warehouse scale --size LARGE
PositionalDescription
actionstatus, start, or scale
FlagDescription
--sizeNew warehouse size (required for scale)

state

Manage the clone state store Delta table — tracks every clone operation's per-table progress so failed runs can resume and stale destinations can be detected.

clxs state init
clxs state summary
clxs state stale --source prod
clxs state operations --limit 50
PositionalDescription
actioninit, summary, stale, failed, mark-stale, or operations
FlagDefaultDescription
--source, --catalog(config)Override source catalog
--dest(config)Override destination catalog
--limit20Max results for operations action