Catalog Clone — full Unity Catalog parity
Clone catalogs, schemas, or tables with permissions, tags, lineage, ownership, and Delta history. Cross-workspace via Delta Sharing. Iceberg sources accepted with a hidden-partition preflight.
6 Format Pairs — Delta ↔ Iceberg ↔ Parquet
N×N in-place conversion with strategy dispatch (CONVERT TO DELTA, UniForm metadata, or CTAS+rename). Per-pair compatibility preflight refuses GENERATED columns and hidden Iceberg partitioning before any DDL.
Demo Data Generator — 10 industries, medallion-ready
Realistic, scaled demo data for IoT, finance, retail, healthcare, energy, and more. Bronze / Silver / Gold layers, DQ profiles, ML training labels, and four streaming destinations — Volume, Auto Loader Bronze, direct INSERT, or low-latency Zerobus.
Data Quality — DQX integrated, 14 dimensions
Rules, scorecards, anomalies, freshness, volume, trust scores, observability, and incidents. Powered by Databricks DQX with declarative YAML and per-table coverage tracking.
Master Data Management — match, merge, govern
Golden records, match-merge with negative-match overrides, hierarchies, profiling, scorecards, and a full MDM audit log. Stewardship workflow built in.
FinOps — cost visibility end-to-end
Workspace, query, job, and warehouse cost breakdowns. Storage optimisation recommendations, budget alerts, COPQ analysis, and compute-vs-storage trends from system tables.
Compliance & Audit — DSAR, RTBF, RBAC
Per-operation audit trails (clone_operations, convert_operations) with strategy fingerprints. Certifications, compliance frameworks (GDPR, HIPAA, SOC 2), consent flows, and PII detection.
Lineage & Impact Analysis
Trace data flows across catalogs and workspaces. Schema drift detection, view-dependency graphs, profiling histories, glossary terms, and impact analysis for every proposed change.
Reconciliation & Validation
Row, count, and aggregate reconciliation between source and target after every clone or convert. Diff-and-compare across formats, validation rule packs, and a rollback path when checks fail.
Data Products — contracts, Marketplace, Sharing
Productise tables: declare data contracts, publish to the Databricks Marketplace, share via Delta Sharing across regions, and track SLAs per consumer.
Pipelines, CI/CD & Automation
Schedule clones, converts, and DQ runs. CI/CD via GitHub Actions or Asset Bundles, Databricks Workflows, plugin system, reusable templates, and 10+ operator playbooks.
AI Assistant & ML Asset Tracking
Natural-language data-quality rules, an in-app AI assistant for catalog Q&A, and ML asset tracking across model versions, registered features, and serving endpoints.