CI/CD Integration
GitHub Actions
name: Clone Catalog
on:
schedule:
- cron: '0 2 * * 0' # Every Sunday at 2 AM
workflow_dispatch:
jobs:
clone:
runs-on: ubuntu-latest
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.13'
- run: pip install clone-xs
- name: Pre-flight checks
run: |
clxs preflight \
--source production \
--dest staging \
--warehouse-id ${{ vars.WAREHOUSE_ID }}
- name: Clone catalog
run: |
clxs clone \
--source production \
--dest staging \
--warehouse-id ${{ vars.WAREHOUSE_ID }} \
--validate \
--enable-rollback \
--report
Azure DevOps
trigger: none
schedules:
- cron: '0 2 * * 0'
displayName: Weekly catalog clone
branches:
include: [main]
pool:
vmImage: 'ubuntu-latest'
variables:
- group: databricks-credentials
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '3.13'
- script: pip install clone-xs
displayName: Install Clone Catalog
- script: |
clxs preflight \
--source production \
--dest staging \
--warehouse-id $(WAREHOUSE_ID)
displayName: Pre-flight checks
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_CLIENT_ID: $(DATABRICKS_CLIENT_ID)
DATABRICKS_CLIENT_SECRET: $(DATABRICKS_CLIENT_SECRET)
- script: |
clxs clone \
--source production \
--dest staging \
--warehouse-id $(WAREHOUSE_ID) \
--validate \
--enable-rollback
displayName: Clone catalog
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_CLIENT_ID: $(DATABRICKS_CLIENT_ID)
DATABRICKS_CLIENT_SECRET: $(DATABRICKS_CLIENT_SECRET)
GitLab CI
stages:
- clone
clxs:
image: python:3.13
stage: clone
only:
- schedules
variables:
DATABRICKS_HOST: $DATABRICKS_HOST
DATABRICKS_TOKEN: $DATABRICKS_TOKEN
script:
- pip install clone-xs
- clxs preflight --source production --dest staging --warehouse-id $WAREHOUSE_ID
- clxs clone --source production --dest staging --warehouse-id $WAREHOUSE_ID --validate
Databricks Workflows
Generate a Databricks Workflow definition for scheduled cloning:
clxs generate-workflow \
--schedule "0 0 2 * * ?" \
--job-name "nightly-staging-clone" \
--cluster-id "0310-abc123-def456" \
--notification-email "data-team@company.com"
Deploy with the Databricks CLI:
databricks jobs create --json @workflow.json
Generate Asset Bundle YAML
clxs generate-workflow --format yaml --output bundle/clone_job.yaml
Include the YAML in your Databricks Asset Bundle for GitOps-managed job deployment.
Config profiles for environments
Use config profiles to manage multiple environments from a single config file:
# config/clone_config.yaml
source_catalog: "production"
sql_warehouse_id: "abc123"
profiles:
dev:
destination_catalog: "dev_catalog"
clone_type: "SHALLOW"
copy_permissions: false
staging:
destination_catalog: "staging_catalog"
clone_type: "DEEP"
validate_after_clone: true
dr:
destination_catalog: "dr_catalog"
clone_type: "DEEP"
enable_rollback: true
slack_webhook_url: "https://hooks.slack.com/services/XXX/YYY/ZZZ"
# In different pipelines
clxs clone --profile dev
clxs clone --profile staging
clxs clone --profile dr
Service principal authentication
For CI/CD, use a service principal rather than a personal access token:
# Databricks OAuth Service Principal
export DATABRICKS_HOST="https://adb-xxx.azuredatabricks.net"
export DATABRICKS_CLIENT_ID="your-sp-client-id"
export DATABRICKS_CLIENT_SECRET="your-sp-secret"
# Azure AD Service Principal
export DATABRICKS_HOST="https://adb-xxx.azuredatabricks.net"
export AZURE_CLIENT_ID="your-azure-client-id"
export AZURE_CLIENT_SECRET="your-azure-secret"
export AZURE_TENANT_ID="your-tenant-id"
Terraform / Pulumi export
Export your catalog structure as Infrastructure-as-Code:
# Terraform
clxs export-iac --source production --format terraform --output catalog.tf
# Pulumi
clxs export-iac --source production --format pulumi --output catalog_pulumi.py
Config diff for PR reviews
Compare config changes before merging:
clxs config-diff config/staging_old.yaml config/staging_new.yaml
Output:
============================================================
CONFIG DIFF
A: config/staging_old.yaml
B: config/staging_new.yaml
============================================================
Added in B (1):
+ validate_checksum: true
Removed from B (1):
- dry_run: true
Changed (2):
~ max_workers: 4 -> 8
~ parallel_tables: 1 -> 4
============================================================