Governance
Clone Catalog provides governance features to enforce access policies, require approvals for sensitive operations, generate compliance reports, and protect PII data during cloning.
Role-based access control (RBAC)
Control who can clone what, and where, using declarative policy files.
Real-world scenario
Your organization has multiple teams sharing a Databricks workspace. The data engineering team should be able to clone any catalog to dev_* destinations, but only the platform team should clone to production_dr. RBAC policies enforce these rules so a misconfigured CI pipeline cannot accidentally overwrite a production replica.
Usage
# Check if the current user can perform a clone
clxs rbac check
# Show all policies that apply to the current user
clxs rbac show
# Show policies for a specific user
clxs rbac show --user "data-engineering@company.com"
# Validate RBAC policy file syntax
clxs policy-check --policy-file rbac_policy.yaml
Output (rbac check):
============================================================
RBAC CHECK: production → dev_sandbox
============================================================
Principal: data-engineering@company.com
Source: production [ALLOWED]
Destination: dev_sandbox [ALLOWED]
Clone Type: DEEP [ALLOWED]
Result: PERMITTED
============================================================
Policy file format
# config/rbac_policy.yaml
rbac:
policies:
- name: "Data Engineering - Dev Access"
principals:
- "data-engineering@company.com"
- "de-leads@company.com"
sources:
- "production"
- "staging"
destinations:
- "dev_*" # Wildcard: any catalog starting with dev_
- "sandbox_*"
allowed_clone_types:
- "DEEP"
- "SHALLOW"
allowed_operations:
- "clone"
- "sync"
- "diff"
- "validate"
- name: "Platform Team - Full Access"
principals:
- "platform-team@company.com"
sources:
- "*" # Any source
destinations:
- "*" # Any destination
allowed_clone_types:
- "DEEP"
- "SHALLOW"
- name: "Analysts - Read Only"
principals:
- "analysts@company.com"
sources:
- "production"
destinations: [] # No clone destinations allowed
allowed_operations:
- "diff"
- "compare"
- "stats"
- "search"
deny_rules:
- name: "Block production overwrites"
principals:
- "*"
destinations:
- "production"
reason: "Cloning into the production catalog is prohibited."
- name: "Block PII catalogs for contractors"
principals:
- "contractors@company.com"
sources:
- "hr_data"
- "pii_*"
reason: "Contractors cannot access PII catalogs."
Enforced operations
RBAC checks run automatically before every clone, sync, diff, and incremental-sync operation. If the current principal is not permitted, the operation is blocked:
# RBAC is enforced automatically on clone
clxs clone --source production --dest production_backup
# [RBAC] Denied: Cloning into the production catalog is prohibited.
# Clone aborted.
# RBAC is also enforced on sync, diff, and incremental-sync
clxs sync --source production --dest staging
# [RBAC] Denied: sync operation not allowed for user@company.com
Operation-level permissions
The allowed_operations field controls which operations a principal can perform. Use "*" to allow all operations:
rbac:
policies:
- name: "Analysts - Read Only"
principals:
- "analysts@company.com"
sources:
- "production"
destinations: []
allowed_operations:
- "diff" # Can compare catalogs
- "stats" # Can view statistics
- "search" # Can search metadata
# Cannot clone, sync, or incremental-sync
- name: "Platform Team - Full Access"
principals:
- "platform-team@company.com"
sources: ["*"]
destinations: ["*"]
allowed_operations:
- "*" # All operations permitted
API endpoints
Manage RBAC policies programmatically via the REST API:
| Method | Endpoint | Description |
|---|---|---|
GET | /rbac/policies | List all RBAC policies |
POST | /rbac/policies | Create a new RBAC policy |
DELETE | /rbac/policies | Delete an RBAC policy by name |
# List all policies
curl http://localhost:8080/rbac/policies
# Create a policy
curl -X POST http://localhost:8080/rbac/policies \
-H "Content-Type: application/json" \
-d '{
"name": "Dev Team Access",
"principals": ["dev@company.com"],
"sources": ["staging"],
"destinations": ["dev_*"],
"allowed_operations": ["clone", "sync", "diff"]
}'
# Delete a policy
curl -X DELETE http://localhost:8080/rbac/policies \
-H "Content-Type: application/json" \
-d '{"name": "Dev Team Access"}'
RBAC enforcement should only be bypassed in development environments. In CI/CD pipelines, always leave RBAC enabled to prevent unauthorized operations.
Approval workflows
Require human approval before sensitive clone operations proceed.
Real-world scenario
Your team wants to clone the finance catalog — which contains sensitive revenue data — to a new finance_qa environment. Company policy requires a manager to approve any operation that copies financial data. The approval workflow pauses the clone and sends a Slack message to the approver. The clone resumes only after the manager approves.
Usage
# Submit a clone for approval
clxs clone \
--source finance --dest finance_qa \
--require-approval
# List pending approvals
clxs approval list
# Approve a pending request (by ID)
clxs approval approve ap-2026031401
# Deny a request with a reason
clxs approval deny ap-2026031401 \
--reason "Use the existing QA catalog instead"
# Check status of a specific request
clxs approval status ap-2026031401
Output (approval list):
============================================================
PENDING APPROVALS
============================================================
ID Source Dest Requester Submitted
ap-2026031401 finance finance_qa alice@company.com 2026-03-14 09:15:00
ap-2026031302 hr_data hr_staging bob@company.com 2026-03-13 14:30:00
============================================================
Configuration
approval:
enabled: true
channel: "slack" # slack | cli | webhook
slack_webhook: "https://hooks.slack.com/services/T00/B00/xxxx"
approvers:
- "manager@company.com"
- "platform-team@company.com"
timeout_hours: 24 # Auto-deny after 24 hours
require_reason_on_deny: true # Force approvers to explain denials
catalogs_requiring_approval: # Only these catalogs need approval
- "finance"
- "hr_data"
- "pii_*"
How it works
- User runs
clxs clone --require-approval - The tool creates an approval request and sends a notification (Slack/webhook/CLI)
- The clone process enters a waiting state — it polls for approval status
- An approver runs
clxs approval approve <id>(or clicks the Slack button) - The clone resumes automatically
If the timeout expires or the request is denied, the clone is aborted.
Approval state is stored locally in .clxs/approvals/. In team environments, consider using the API server mode (clxs serve) for shared approval state.
Compliance reports
Generate audit-ready reports of all clone operations, data access patterns, and security posture.
Real-world scenario
Your compliance team needs a quarterly report covering: which catalogs were cloned, who performed each operation, whether PII data was involved, and what permissions were applied. Instead of manually piecing together logs, you generate a compliance report that covers all of this in one command.
Usage
# Generate a compliance report for the last 30 days
clxs compliance-report
# Custom date range
clxs compliance-report \
--from 2026-01-01 --to 2026-03-14
# Output as HTML (shareable with non-technical stakeholders)
clxs compliance-report --format html --output-dir reports/
# Output as JSON (for integration with GRC tools)
clxs compliance-report --format json --output-dir reports/
Output (console):
============================================================
COMPLIANCE REPORT: 2026-01-01 to 2026-03-14
============================================================
OPERATIONS SUMMARY
------------------
Total clone operations: 47
Successful: 44
Failed: 2
Rolled back: 1
Approvals required: 5
Approvals denied: 1
PII DATA HANDLING
------------------
Clones involving PII: 8
PII columns masked: 23
Masking strategies used: hash (12), redact (8), null (3)
Unmasked PII clones: 0
PERMISSIONS AUDIT
------------------
Permission copies: 44
Ownership transfers: 44
RBAC violations blocked: 3
DATA LINEAGE
------------------
Source catalogs used: 3 (production, staging, analytics)
Destination catalogs: 7
Cross-workspace clones: 2
VALIDATION RESULTS
------------------
Validated clones: 40
Validation pass rate: 97.5%
Checksum validations: 12
============================================================
Report saved to: reports/compliance_20260314.txt
Report sections
| Section | Contents |
|---|---|
| Operations Summary | Clone counts, success/failure rates, rollback events |
| PII Data Handling | PII scans performed, masking applied, unmasked copies |
| Permissions Audit | Permission copies, ownership transfers, RBAC blocks |
| Data Lineage | Source/destination mapping, cross-workspace activity |
| Validation Results | Post-clone validation pass rates, checksum results |
Retention policy
compliance:
report_retention_days: 365 # Keep reports for 1 year
log_retention_days: 90 # Keep detailed operation logs for 90 days
auto_generate: "monthly" # Auto-generate reports: daily | weekly | monthly
output_directory: "reports/"
Schedule compliance report generation in your CI/CD pipeline or cron job to ensure reports are always up to date for auditors.
Pre-flight permission checks
Validate Unity Catalog permissions before cloning, with implicit and inherited grant detection.
Pre-flight checks (clxs preflight) now detect implicit UC privileges that previous versions would miss:
| Check | Detects |
|---|---|
dest_manage_permission | Catalog ownership, catalog-level MANAGE, schema-level MANAGE |
dest_create_table | Ownership, MANAGE (implies CREATE TABLE), schema-level CREATE TABLE |
source_use_catalog | Ownership (shows "(owner)"), USE CATALOG grant |
create_catalog_permission | Metastore-level CREATE CATALOG grant |
When a check fails, the CLI and Web UI display the exact GRANT command needed to fix it:
GRANT USE CATALOG ON CATALOG my_catalog TO `user@company.com`;
GRANT CREATE TABLE ON SCHEMA my_catalog.bronze TO `user@company.com`;
In the Web UI, these commands are clickable code blocks (click to copy) with links to the Unity Catalog privileges documentation.
Run clxs preflight before every clone in CI/CD pipelines. Failed permission checks now give actionable GRANT commands instead of generic error messages.
PII and data masking
Detect and mask personally identifiable information during cloning to protect sensitive data in non-production environments.
Clone Catalog includes a comprehensive PII detection engine with structural validators, cross-column correlation, Unity Catalog tag integration, scan history tracking, and remediation workflows.
For the full PII detection documentation, see the dedicated PII Detection & Protection guide.
Quick example
# Scan with all detection methods enabled
clxs pii-scan --source production \
--sample-data --read-uc-tags --save-history
# Clone with masking rules to protect PII
clxs clone --source production --dest dev \
--config config/clone_config.yaml
# config/clone_config.yaml
masking_rules:
- column: "email|email_address"
strategy: "email_mask"
match_type: "regex"
- column: "ssn|social_security"
strategy: "hash"
match_type: "regex"
- column: "phone|mobile"
strategy: "partial"
match_type: "regex"
Tie-in with compliance
When PII masking is applied during a clone, the compliance report automatically records:
- Which columns were masked
- Which masking strategy was used
- Whether any PII columns were left unmasked (flagged as a compliance risk)
Use clxs pii-scan --apply-tags to automatically tag PII columns in Unity Catalog, enabling downstream data governance policies.
Right to Be Forgotten (RTBF)
Clone-Xs includes a full GDPR Article 17 erasure workflow for handling data subject deletion requests across all cloned catalogs. The RTBF module:
- Discovers subject data across every cloned catalog using PII detection patterns
- Deletes or anonymizes the data with configurable strategies
- VACUUMs Delta history to physically remove time-travel data
- Verifies the deletion by re-querying all affected tables
- Generates compliance certificates (HTML + JSON) for DPO/legal review
RTBF supports 34 legal bases from 18 global jurisdictions (EU GDPR, UK GDPR, US CCPA/CPRA + state laws, Brazil LGPD, India DPDPA, and more).
For full documentation, see the dedicated RTBF guide.
Data Subject Access Requests (DSAR)
Clone-Xs provides a GDPR Article 15 access request workflow that discovers a data subject's personal data across all cloned catalogs and exports it as CSV, JSON, or Parquet — with full audit trail and 30-day deadline tracking.
DSAR reuses the same subject discovery engine as RTBF — the same PII column patterns, lineage tracking, and information_schema queries. The difference: DSAR runs SELECT + export instead of DELETE.
Both RTBF and DSAR are accessible from the Governance portal under the Compliance section in the Governance sidebar.
For full documentation, see the dedicated DSAR guide.
Data Quality (DQX)
Clone-Xs includes a comprehensive data quality engine with 57+ check functions, table profiling, anomaly detection, and more. Key governance-related DQ capabilities:
- DQ Gate — block clone/sync operations when data quality falls below a threshold
- Check Audit Log — track who changed what check and when (required for regulated environments)
- DQ Coverage Report — measure what percentage of your data estate has quality checks
- Cross-Table Consistency Checks — referential integrity, aggregate matching, row count comparison
- Scheduled DQ Runs — cron-based recurring quality check execution
For full documentation, see the dedicated Data Quality (DQX) guide.