Skip to main content

Clone → Xs | Unity Catalog Toolkit

Unity Catalog Toolkit for Databricks

Catalog Clone — full Unity Catalog parity

Clone catalogs, schemas, or tables with permissions, tags, lineage, ownership, and Delta history. Cross-workspace via Delta Sharing. Iceberg sources accepted with a hidden-partition preflight.

6 Format Pairs — Delta ↔ Iceberg ↔ Parquet

N×N in-place conversion with strategy dispatch (CONVERT TO DELTA, UniForm metadata, or CTAS+rename). Per-pair compatibility preflight refuses GENERATED columns and hidden Iceberg partitioning before any DDL.

Demo Data Generator — 10 industries, medallion-ready

Realistic, scaled demo data for IoT, finance, retail, healthcare, energy, and more. Bronze / Silver / Gold layers, DQ profiles, ML training labels, and four streaming destinations — Volume, Auto Loader Bronze, direct INSERT, or low-latency Zerobus.

Data Quality — DQX integrated, 14 dimensions

Rules, scorecards, anomalies, freshness, volume, trust scores, observability, and incidents. Powered by Databricks DQX with declarative YAML and per-table coverage tracking.

Master Data Management — match, merge, govern

Golden records, match-merge with negative-match overrides, hierarchies, profiling, scorecards, and a full MDM audit log. Stewardship workflow built in.

FinOps — cost visibility end-to-end

Workspace, query, job, and warehouse cost breakdowns. Storage optimisation recommendations, budget alerts, COPQ analysis, and compute-vs-storage trends from system tables.

Compliance & Audit — DSAR, RTBF, RBAC

Per-operation audit trails (clone_operations, convert_operations) with strategy fingerprints. Certifications, compliance frameworks (GDPR, HIPAA, SOC 2), consent flows, and PII detection.

Lineage & Impact Analysis

Trace data flows across catalogs and workspaces. Schema drift detection, view-dependency graphs, profiling histories, glossary terms, and impact analysis for every proposed change.

Reconciliation & Validation

Row, count, and aggregate reconciliation between source and target after every clone or convert. Diff-and-compare across formats, validation rule packs, and a rollback path when checks fail.

Data Products — contracts, Marketplace, Sharing

Productise tables: declare data contracts, publish to the Databricks Marketplace, share via Delta Sharing across regions, and track SLAs per consumer.

Pipelines, CI/CD & Automation

Schedule clones, converts, and DQ runs. CI/CD via GitHub Actions or Asset Bundles, Databricks Workflows, plugin system, reusable templates, and 10+ operator playbooks.

AI Assistant & ML Asset Tracking

Natural-language data-quality rules, an in-app AI assistant for catalog Q&A, and ML asset tracking across model versions, registered features, and serving endpoints.