Skip to main content

Clone Pipelines

Chain multiple clone operations into reusable, automated workflows. Build pipelines from 6 step types, use pre-built templates, configure failure policies, and track execution history.

Overview

A pipeline is an ordered list of steps that execute sequentially:

  Clone ──▶ Mask PII ──▶ Validate ──▶ Notify
│ │ │ │
▼ ▼ ▼ ▼
DEEP hash/redact row counts Slack

Each step has:

  • Type — clone, mask, validate, notify, vacuum, custom_sql
  • Config — step-specific parameters
  • On failure — abort (stop), skip (continue), or retry (up to 3x with backoff)

Quick start

# Create from template
clxs pipeline templates # list available templates
clxs pipeline create-from-template production-to-dev

# Create custom pipeline
clxs pipeline create --name "My Pipeline" --steps '[
{"type":"clone","name":"Clone catalog","on_failure":"abort"},
{"type":"validate","name":"Validate","on_failure":"abort"},
{"type":"notify","name":"Alert team","on_failure":"skip"}
]'

# Run
clxs pipeline run --pipeline-id <ID>
clxs pipeline status --run-id <RUN_ID>

Step types

TypeWhat it doesFailure default
cloneDeep/shallow clone the catalogabort
maskApply configured masking rules to PII columnsabort
validateRow count validation (optional checksums)abort
notifySend Slack/Teams notificationskip
vacuumRun VACUUM on destination tablesskip
custom_sqlExecute arbitrary SQL statementabort

Built-in templates

TemplateStepsUse case
production-to-devClone, Mask PII, Validate, NotifyRefresh dev from production with PII protection
clone-and-validateClone, Validate (checksums)Quick clone with verification
refresh-devVacuum, Clone, Mask, Validate, NotifyFull dev refresh cycle
compliance-clonePreflight SQL, Clone, Mask, Validate (checksums), NotifyAudit-compliant clone

Failure policies

PolicyBehavior
abortStop the pipeline immediately. Mark run as failed.
skipLog the failure, continue to the next step.
retryRetry up to N times (default 3) with exponential backoff. Abort if all retries fail.

Configuration

pipelines:
max_concurrent_steps: 1 # sequential execution
default_on_failure: abort
retry_max_attempts: 3
retry_backoff_seconds: 30

Audit trail

Three Delta tables track pipeline state:

  • pipelines — pipeline definitions (name, steps JSON, template source)
  • pipeline_runs — execution runs (status, timing, triggered_by)
  • pipeline_step_results — per-step results (status, duration, error, result JSON)

Next steps

  • Clone — understand clone operations used as pipeline steps
  • PII Detection — masking steps use PII detection patterns
  • Scheduling — schedule pipeline runs on cron intervals