Skip to main content

API Reference

Complete reference for the Clone-Xs REST API. Start the API server with clxs serve or make api-start.

Base URL: http://localhost:8080/api

Interactive docs: Once the server is running, visit http://localhost:8080/docs for Swagger UI or http://localhost:8080/redoc for ReDoc.

Authentication

All endpoints accept optional Databricks credentials via headers:

  • X-Databricks-Host: Workspace URL (e.g. https://adb-123456.azuredatabricks.net)
  • X-Databricks-Token: Personal access token

When running as a Databricks App, authentication is automatic via service principal. Otherwise, call POST /api/auth/login first or pass headers on each request.


Health

GET /api/health

Returns service health status and runtime environment.

Example request:

curl http://localhost:8080/api/health

Example response:

{
"status": "ok",
"service": "Clone-Xs",
"runtime": "standalone"
}

Auth

Endpoints for authenticating to Databricks workspaces via PAT, OAuth, service principal, Azure AD, or CLI profiles.

GET /api/auth/auto-login

Auto-login when running as a Databricks App (service principal injected). Returns 404 if not running as a Databricks App.

Example response:

{
"authenticated": true,
"user": "service-principal@company.com",
"host": "https://adb-123456.azuredatabricks.net",
"auth_method": "databricks-app"
}

POST /api/auth/login

Authenticate to a Databricks workspace with a personal access token.

FieldTypeRequiredDescription
hoststringYesDatabricks workspace URL
tokenstringYesPersonal access token

Example request:

curl -X POST http://localhost:8080/api/auth/login \
-H "Content-Type: application/json" \
-d '{"host": "https://adb-123456.azuredatabricks.net", "token": "dapi..."}'

Example response:

{
"authenticated": true,
"user": "user@company.com",
"host": "https://adb-123456.azuredatabricks.net",
"auth_method": "pat"
}

GET /api/auth/status

Check current authentication status.

Example response:

{
"authenticated": true,
"user": "user@company.com",
"host": "https://adb-123456.azuredatabricks.net",
"auth_method": "pat"
}

POST /api/auth/oauth-login

Trigger browser-based OAuth U2M login.

FieldTypeRequiredDescription
hoststringYesDatabricks workspace URL

GET /api/auth/profiles

List available Databricks CLI profiles from ~/.databrickscfg.

Example response:

[
{"name": "DEFAULT", "host": "https://adb-123456.azuredatabricks.net"},
{"name": "staging", "host": "https://adb-789012.azuredatabricks.net"}
]

POST /api/auth/use-profile

Switch to a specific CLI profile.

FieldTypeRequiredDescription
profile_namestringYesCLI profile name

POST /api/auth/service-principal

Authenticate with service principal credentials.

FieldTypeRequiredDescription
hoststringYesDatabricks workspace URL
client_idstringYesService principal client ID
client_secretstringYesService principal client secret
tenant_idstringNoAzure AD tenant ID (required for Azure)
auth_typestringNo"databricks" or "azure" (default: "databricks")

POST /api/auth/azure-login

Trigger Azure CLI browser login (az login).

GET /api/auth/azure/tenants

List Azure tenants.

GET /api/auth/azure/subscriptions

List Azure subscriptions, optionally filtered by tenant.

ParameterTypeInRequiredDescription
tenant_idstringqueryNoFilter by tenant

GET /api/auth/azure/workspaces

List Databricks workspaces in an Azure subscription.

ParameterTypeInRequiredDescription
subscription_idstringqueryYesAzure subscription ID

POST /api/auth/azure/connect

Connect to a Databricks workspace discovered via Azure CLI auth.

FieldTypeRequiredDescription
hoststringYesDatabricks workspace URL

GET /api/auth/env-vars

Check which Databricks environment variables are set. Sensitive values are masked.

Example response:

{
"DATABRICKS_HOST": "https://adb-123456.azuredatabricks.net",
"DATABRICKS_TOKEN": "dapi...wxyz",
"DATABRICKS_CLIENT_ID": null,
"DATABRICKS_CLIENT_SECRET": null,
"AZURE_CLIENT_ID": null,
"AZURE_CLIENT_SECRET": null,
"AZURE_TENANT_ID": null,
"DATABRICKS_CONFIG_PROFILE": null
}

GET /api/auth/warehouses

List available SQL warehouses.

Example response:

[
{"id": "abc123", "name": "Starter Warehouse", "size": "Small", "state": "RUNNING", "type": "PRO"}
]

GET /api/auth/volumes

List available Unity Catalog volumes.

POST /api/auth/test-warehouse

Test a SQL warehouse by running SELECT 1. Useful before submitting a clone to validate connectivity + permissions in one round-trip.

Request body:

FieldTypeRequiredDefaultDescription
warehouse_idstringYesSQL warehouse ID to test

Response:

{ "status": "ok", "message": "Warehouse is reachable", "result": [{"1": 1}] }

POST /api/auth/logout

Clear the authentication cache and current session. Subsequent requests need to re-authenticate via /api/auth/login (or auto-login).

Response:

{ "status": "ok", "message": "Logged out successfully" }

GET /api/auth/serving-endpoints

List Databricks Model Serving endpoints. Used by the AI-assistant + AI-narrative surfaces to populate the model picker. Filters out endpoints in non-READY state.

Response:

{
"success": true,
"endpoints": [
{ "name": "databricks-meta-llama-3-1-405b", "state": "READY", "provider": "databricks", "is_claude": false },
{ "name": "claude-sonnet-4", "state": "READY", "provider": "anthropic", "is_claude": true }
]
}

GET /api/auth/genie-spaces

List Databricks Genie spaces (natural-language SQL surfaces). Populates the Genie space picker on the AI-assistant page.

Response:

{
"success": true,
"spaces": [
{ "space_id": "01ef…", "title": "Sales — Production", "description": "Genie space over `prod.sales`" }
]
}

Clone

Start clone jobs, track progress, list and cancel jobs. Uses CREATE TABLE ... CLONE under the hood.

POST /api/clone

Submit a clone job to the background queue.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog name
destination_catalogstringYesDestination catalog name
warehouse_idstringNoFrom configSQL warehouse ID
clone_typestringNo"DEEP""DEEP" or "SHALLOW"
load_typestringNo"FULL""FULL" or "INCREMENTAL"
dry_runbooleanNofalsePreview without executing
max_workersintegerNo4Parallel thread count
parallel_tablesintegerNo1Tables to clone simultaneously
include_schemasstring[]No[]Only clone these schemas
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
include_tables_regexstringNoRegex filter for table names
exclude_tables_regexstringNoRegex to exclude table names
copy_permissionsbooleanNotrueCopy table permissions
copy_ownershipbooleanNotrueCopy table ownership
copy_tagsbooleanNotrueCopy Unity Catalog tags
copy_propertiesbooleanNotrueCopy table properties
copy_securitybooleanNotrueCopy security settings
copy_constraintsbooleanNotrueCopy table constraints
copy_commentsbooleanNotrueCopy column/table comments
enable_rollbackbooleanNotrueEnable rollback logging
validate_after_clonebooleanNofalseRun validation after clone
validate_checksumbooleanNofalseUse checksums for validation
order_by_sizestringNo"asc" or "desc" by table size
max_rpsfloatNo0Rate limit (requests per second)
as_of_timestampstringNoTime-travel timestamp
as_of_versionintegerNoTime-travel Delta version
locationstringNoExternal location for catalog
serverlessbooleanNofalseUse serverless compute
volumestringNoUC Volume path for serverless
include_objectsobject[]NoPartial-scope clone — a list of {schema, name, type} records where type is table, view, function, or volume. Translated by the router into include_schemas + an anchored include_tables_regex. Use instead of (or alongside) include_schemas when the UI Scope Picker is in "Select schemas + objects" mode.
target_workspaceobjectNoCross-workspace migration — see Target Workspace. When set, routes the job to the Delta Sharing + DEEP CLONE orchestrator (job_type=clone_cross_workspace) and the destination_catalog may legitimately share the source name since it lives on a different metastore.
clone_viewsbooleanNotrueCross-workspace only — re-issue view DDL on the target with catalog references rewritten. No effect for same-workspace clones (those always migrate views).
clone_functionsbooleanNotrueCross-workspace only — re-issue SQL function DDL on the target. No effect for same-workspace clones.
clone_volumesbooleanNotrueCross-workspace only — recreate volumes and copy files via the Databricks Files API. No effect for same-workspace clones.
volume_max_file_mbintegerNo500Cross-workspace only — per-file cap (MB) for managed-volume file copy. Files larger than this are skipped with a warning.
max_duration_minintegerNoRuntime guardrail — abort the clone if wall-clock exceeds this many minutes. Checked between schemas.
max_tablesintegerNoRuntime guardrail — abort after this many tables have been touched. Checked between schemas.
source_snapshot_idstringNoUUID of a row in <audit>.clone_snapshots. When set, resolved to as_of_timestamp so every table clones from the snapshot's captured state. See Clone Snapshots.
target_formatstringNo"DELTA""DELTA" (default) or "ICEBERG". When "ICEBERG", the destination stays Delta but UniForm metadata is enabled post-clone (delta.universalFormat.enabledFormats=iceberg + IcebergCompatV2 + columnMapping=name) so external Iceberg engines can read it without a copy. Only effective on Delta sources — non-Delta sources skip with a WARN. See clone guide — target format.
iceberg_physicalbooleanNofalseOnly meaningful with target_format="ICEBERG". When true, swaps the UniForm path for CREATE TABLE … USING iceberg AS SELECT … so UC reports the destination as Data source: Iceberg. Loses Delta history, ignores time-travel arguments with a WARN, requires DBR 15+ with Iceberg-managed-table support. See clone guide — physical Iceberg target.
auto_mask_piibooleanNofalseAuto-detect PII columns via UC column_tags (EMAIL / SSN / CREDIT_CARD / PHONE / etc.) and mask them on the destination via the existing src/masking.py pipeline. Masking runs as a post-clone UPDATE — the masked-data exposure window is bounded by the clone job. See clone guide — auto-mask PII.
enable_retrybooleanNotrueAuto-retry transient clone failures (network, throttle, 5xx, HTTP 429) with exponential backoff. Logical errors (schema mismatch, permission, validation) never retry. Bounded by max_retries (config, default 3).
compare_dq_after_clonebooleanNofalseRun a column-level DQ comparison after each schema clones — row count + per-column NULL counts on source vs target. Combined with auto_rollback_on_failure, max-drift exceeding dq_drift_rollback_pct triggers Delta RESTORE. Adds one warehouse round-trip per cloned table.
dq_drift_rollback_pctfloatNo5.0Drift threshold (0–100) for compare_dq_after_clone. Matches the existing row-count rollback_threshold so operators have one mental model for "acceptable drift."
where_clausesobjectNo{}Per-table predicate filter, e.g. {"bronze.events": "date >= '2026-01-01'", "*": "is_deleted = false"}. Forces the per-table CLONE to a CTAS path (CREATE TABLE … AS SELECT * FROM src WHERE …) — loses Delta source history. DEEP-only; ignored on SHALLOW with a WARN. See clone guide — WHERE-clause filtered clone.
clone_tbl_propertiesobjectNo{}Inline TBLPROPERTIES (...) rendered onto every per-table CLONE statement (e.g. {"delta.logRetentionDuration": "3650 days"}). Required for properties that must be on the first commit — setting via ALTER TABLE post-clone is too late. See clone guide — inline TBLPROPERTIES.
quiesce_sourcebooleanNofalsePre-clone source quiesce. Snapshot + revoke write privileges on the source schemas at clone start, re-grant in a finally block at clone end. Prevents concurrent writes from landing mid-clone. See clone guide — pre-clone quiesce.

Example request:

curl -X POST http://localhost:8080/api/clone \
-H "Content-Type: application/json" \
-d '{
"source_catalog": "prod",
"destination_catalog": "prod_clone",
"clone_type": "DEEP",
"dry_run": false
}'

Example response:

{
"job_id": "a1b2c3d4",
"status": "queued",
"message": "Clone job submitted"
}

GET /api/clone/jobs

List all clone jobs and their statuses.

Example response:

[
{
"job_id": "a1b2c3d4",
"status": "running",
"source_catalog": "prod",
"destination_catalog": "prod_clone",
"clone_type": "DEEP",
"progress": {"completed": 12, "total": 50},
"created_at": "2025-01-15T10:30:00Z"
}
]

GET /api/clone/{job_id}

Get status and details for a specific clone job.

ParameterTypeInRequiredDescription
job_idstringpathYesJob ID

Example response:

{
"job_id": "a1b2c3d4",
"status": "completed",
"source_catalog": "prod",
"destination_catalog": "prod_clone",
"progress": {"completed": 50, "total": 50},
"result": {"tables_cloned": 50, "tables_failed": 0},
"logs": ["Cloning schema1.table1...", "Done."],
"created_at": "2025-01-15T10:30:00Z",
"completed_at": "2025-01-15T10:45:00Z"
}

DELETE /api/clone/{job_id}

Cancel a running or queued clone job.

ParameterTypeInRequiredDescription
job_idstringpathYesJob ID

Example response:

{"status": "cancelled", "job_id": "a1b2c3d4"}

WebSocket /api/clone/ws/{job_id}

WebSocket endpoint for live clone progress updates. Send "ping" to keep the connection alive; receive JSON progress events.


Convert to Delta

In-place format conversion from Parquet / Iceberg to Delta. Distinct from /api/clone because the operation is destructive on source (no destination FQN — the same FQN keeps pointing at the same data, but the underlying format changes), and synchronous (no job queue — typical workloads are a handful of tables and operators want immediate feedback).

See Convert table format guide for ergonomics, when to use this vs. clone, and limitations.

POST /api/convert-to-delta

Convert one or more UC-registered tables in-place from Parquet or Iceberg to Delta. Two-layer safety gate: a Pydantic validator on the request and a module-level check in the orchestrator. Without confirm_destructive: true (and without dry_run: true) the endpoint returns 422.

Request body:

FieldTypeRequiredDefaultDescription
targetsobject[]YesAt least one. Each target is {fqn: "catalog.schema.table", source_format: "ICEBERG" | "PARQUET" | "DELTA"}. Already-Delta and unsupported formats skip without hitting the warehouse.
warehouse_idstringNoFrom configSQL warehouse to execute the DDL on.
confirm_destructivebooleanRequired unless dry_runfalseExplicit acknowledgement that the source table will be rewritten. Server returns 422 if missing on a non-dry-run request.
dry_runbooleanNofalseLogs the SQL but doesn't execute. Bypasses the confirmation gate so wizard previews are safe.

Per-target behaviour:

Source data_source_format / table_typeAction
ICEBERG or PARQUET (MANAGED / EXTERNAL)Runs CONVERT TO DELTA \catalog`.`schema`.`table``
Already DELTASkipped, no SQL emitted
STREAMING_TABLE / MATERIALIZED_VIEW / VIEWSkipped, no SQL emitted (pipeline-owned tables; views have no underlying files)
Unsupported format (CSV, JSON, etc.)Skipped, no SQL emitted

Response (200):

{
"total": 2,
"converted": 1,
"failed": 1,
"skipped": 0,
"results": [
{"fqn": "edp_dev.bronze.events_iceberg", "source_format": "ICEBERG",
"status": "converted", "duration_ms": 14820, "error": null},
{"fqn": "edp_dev.bronze.legacy_parquet", "source_format": "PARQUET",
"status": "failed", "duration_ms": 121, "error": "USE CATALOG required"}
]
}

The endpoint returns 200 with partial results when some targets fail — operators read per-target status to decide whether to re-submit just the failures.

Status codes:

CodeCause
200Batch processed (some targets may still have failed — check results[].status)
400warehouse_id missing (request and default config both empty)
422Validation: confirm_destructive false and dry_run false, or targets empty

Audit trail:

Each batch generates one operation_id (UUID). Per-target rows are written to <audit_catalog>.logs.convert_operations (sibling of the existing clone_operations table) with status / source_format / dry_run / duration / error. Init failures are best-effort — if the audit table can't be created, the conversion proceeds without audit. See Audit for the schema.

Example (dry-run preview):

curl -X POST http://localhost:8080/api/convert-to-delta \
-H "Content-Type: application/json" \
-d '{
"targets": [
{"fqn": "edp_dev.bronze.events", "source_format": "ICEBERG"}
],
"warehouse_id": "abc123",
"dry_run": true
}'

Example (real conversion):

curl -X POST http://localhost:8080/api/convert-to-delta \
-H "Content-Type: application/json" \
-d '{
"targets": [
{"fqn": "edp_dev.bronze.events", "source_format": "ICEBERG"}
],
"warehouse_id": "abc123",
"confirm_destructive": true
}'

GET /api/convert-to-delta/history

List rows from the convert_operations audit table, newest first. One row per (operation_id, fqn) — a batch of N targets produces N rows linked by operation_id. Empty array (200) when the audit table doesn't exist yet (fresh workspace) — operators shouldn't see an error in the wizard's Recent Runs panel just because no convert has run yet.

Query parameters:

ParameterTypeRequiredDefaultDescription
limitintegerNo50Max rows. Hard-capped at 1000 server-side to protect the warehouse.
statusstringNoFilter by converted / failed / skipped.
fqn_likestringNoSQL LIKE pattern on the fqn column — e.g. "edp.bronze.%" for everything in one schema.
dry_runbooleanNoFilter to dry-run rows (true) or live rows (false).
operation_idstringNoPull every row in one batch, given its UUID.

Response (200):

{
"rows": [
{
"operation_id": "7f3a-...",
"fqn": "edp_dev.bronze.events_iceberg",
"source_format": "ICEBERG",
"status": "converted",
"started_at": "2026-05-02 10:00:00",
"completed_at": "2026-05-02 10:00:12",
"duration_ms": 12480,
"user_name": "viral",
"host": "https://adb-….azuredatabricks.net",
"dry_run": false,
"trigger": "manual",
"error_message": null,
"recorded_at": "2026-05-02 10:00:12"
}
],
"count": 1
}

Status codes:

CodeCause
200Returned (rows may be empty).
400warehouse_id missing from app config and not configurable from this endpoint — set the default in clone_config.yaml or via the Settings page.

GET /api/catalogs/{catalog}/{schema}/tables/with-format

List tables in a UC schema with their table_type and data_source_format. Distinct from the bare /api/catalogs/{catalog}/{schema}/tables endpoint (which returns names only) — this one is consumed by the Convert to Delta wizard's picker so it can show format badges and disable already-Delta / non-convertible rows without a second round-trip.

Path parameters:

ParameterTypeDescription
catalogstringUC catalog name
schemastringUC schema name

Response (200):

[
{"name": "events_iceberg", "table_type": "EXTERNAL", "data_source_format": "ICEBERG"},
{"name": "events_parquet", "table_type": "EXTERNAL", "data_source_format": "PARQUET"},
{"name": "users", "table_type": "MANAGED", "data_source_format": "DELTA"},
{"name": "bronze_pos_terminal","table_type": "STREAMING_TABLE", "data_source_format": "DELTA"}
]

The data_source_format field is normalised to a string at the client boundary (src/client.py:_normalize_format) — the SDK's DataSourceFormat enum is unwrapped to its .value so consumers can .toUpperCase() / compare against "DELTA" directly.


Target Workspace

Endpoints for cross-workspace / cross-cloud catalog migration. See the Cross-workspace clone guide for the full pipeline.

POST /api/target/validate

Verify credentials for a target workspace and read its metastore sharing identifier. Call this before POST /api/clone with target_workspace to fail fast on bad creds.

Request body — the TargetWorkspace model:

FieldTypeRequiredDescription
hoststringYesFull workspace URL (must start with https://)
auth_methodstringNo"pat" (default), "service_principal", or "profile"
tokenstringCond.Required when auth_method="pat"
client_idstringCond.Required when auth_method="service_principal"
client_secretstringCond.Required when auth_method="service_principal"
profilestringCond.CLI profile name (from ~/.databrickscfg); required when auth_method="profile"
warehouse_idstringYesTarget SQL warehouse that will run DDL + DEEP CLONE
keep_sharebooleanNoLegacy/informational — leave the Delta Share intact after migration (false by default). Prefer cleanup_after_clone for new code.
data_sync_modestringNoHow re-runs treat existing target tables. "snapshot_once" (default; CREATE IF NOT EXISTS), "incremental" (CREATE OR REPLACE — mirrors source updates, overwrites target writes), or "force_full" (DROP + CREATE every run)
auto_handle_masksbooleanNoWhen true, Clone-Xs drops column masks / row filters on source so masked tables can be added to the share, re-applies them on target after the clone, and (for snapshot_once / force_full) restores them on source in the finally block. Leaves source masks dropped for incremental mode. Default false.
cleanup_after_clonebooleanNoDrop the deterministic share / recipient / shared-catalog at end of run. Default false so deterministic objects persist between runs and subsequent re-clones reuse them (true incremental sync). Set true for one-shot migrations.
prune_share_extrasbooleanNoWhen true, re-runs also ALTER SHARE … REMOVE TABLE for tables that are in the share but no longer exist in the source. Default false because pruning is destructive on the share side.

Example request:

curl -X POST http://localhost:8080/api/target/validate \
-H "Content-Type: application/json" \
-d '{
"host": "https://adb-target.azuredatabricks.net",
"auth_method": "pat",
"token": "dapi...",
"warehouse_id": "abc123"
}'

Example response (success):

{
"ok": true,
"host": "https://adb-target.azuredatabricks.net",
"user": "data_engineering@example.com",
"catalog_count": 14,
"metastore_sharing_id": "azure:eastus:a1b2c3d4-...",
"sharing_error": null,
"warehouse_state": "RUNNING",
"warehouse_name": "Serverless Starter Warehouse",
"warehouse_start_triggered": false
}

Response fields beyond ok/host:

FieldDescription
userAuthenticated identity on the target (from client.current_user.me()). Surfaced in the UI as "Logged in as ..." so you can spot wrong-token mistakes early.
catalog_countNumber of catalogs the credentials can list — a quick "is this account healthy?" signal.
metastore_sharing_idTarget metastore's global_metastore_id (<cloud>:<region>:<uuid> format). Used as the recipient USING ID on source.
sharing_errorNon-null when auth works but metastore introspection failed. Cross-workspace clone may need manual Delta Sharing setup.
warehouse_stateOne of RUNNING / STARTING / STOPPED / STOPPING / DELETED. The endpoint also fails the validation if warehouse_id doesn't exist.
warehouse_nameDisplay name from Databricks for the supplied warehouse_id — useful if the user typed a different ID than expected.
warehouse_start_triggeredtrue when the warehouse was STOPPED / STOPPING and the endpoint fired a non-blocking warehouses.start() so it'll be RUNNING by clone time.

Responses:

StatusMeaning
200Credentials work, warehouse exists. Body fields above describe the target.
400Request body violates the TargetWorkspace schema (e.g. missing PAT when auth_method="pat"), or the supplied warehouse_id is not visible in the target workspace.
401Authentication failed — bad host, invalid token, or unreachable workspace. Error detail in detail.

POST /api/target/warehouses

List SQL warehouses available in a target workspace. Used by the UI to populate the warehouse dropdown after the user enters host + auth, before they pick a warehouse_id.

Request bodyTargetWorkspaceConnect (same as TargetWorkspace but without warehouse_id).

Example response:

[
{"id": "abc123", "name": "Serverless Starter Warehouse", "size": "Small", "type": "SERVERLESS", "state": "RUNNING"},
{"id": "def456", "name": "Pro Warehouse", "size": "Medium", "type": "PRO", "state": "STOPPED"}
]

POST /api/target/catalogs

List catalog names that exist in a target workspace. Used by the /clone Destination Catalog dropdown when "Clone to a different workspace" is enabled — so the user picks an existing target catalog (or + Create New), instead of seeing source-side catalogs.

Request bodyTargetWorkspaceConnect (same as /api/target/warehouses).

Example response:

["analytics_prod", "main", "samples", "system"]

POST /api/target/whoami

Lightweight identity check — returns just the authenticated user for the supplied target creds. Calls client.current_user.me() only (no warehouse, no metastore lookup, no catalog list), so it's fast enough to fire on /settings → Target Workspaces page mount for every saved connection.

Request bodyTargetWorkspaceConnect.

Example response:

{
"user": "data_engineering@example.com",
"host": "https://adb-target.azuredatabricks.net"
}

Responses: 200 on success, 400 on schema violation, 401 on auth failure (wraps the underlying SDK error in detail).


A note on credential storage

The /api/target/* endpoints are stateless. Saved target connections in the UI live in browser localStorage (key clxs_target_connections); per-clone requests resolve the picked entry to inline credentials and POST them. Nothing about target workspaces persists on the server — neither in clone_config.yaml nor in any database. This avoids a class of "leaked-token-to-git" mistakes that the legacy yaml-based persistence enabled.


Clone Snapshots

Named fork points for point-in-time clones. See Clone Snapshots guide for the full flow. Requires audit_trail.catalog to be configured — snapshots live in a Delta table in that catalog.

POST /api/clone-snapshots

Capture a named snapshot of a catalog's current Delta-version state.

FieldTypeRequiredDescription
source_catalogstringYesCatalog to capture
namestringYesHuman-readable label
descriptionstringNoFree-text context shown in listings
exclude_schemasstring[]NoSchemas to skip; defaults to ["information_schema", "default"]

Response (200):

{
"snapshot_id": "7f3a4b5c-8d2e-4a1f-b9d3-...",
"name": "pre-migration",
"source_catalog": "prod",
"description": "Captured before 2026-04 refactor",
"captured_at": "2026-04-19T14:30:00Z",
"created_by": "alice@example.com",
"table_count": 611,
"total_bytes": 2574326784
}

Errors: 400 if audit_trail.catalog or sql_warehouse_id is missing.

GET /api/clone-snapshots

List all snapshots, newest first.

QueryTypeDescription
source_catalogstring (optional)Filter to snapshots captured from this catalog

Response is an array of the shape above (without tables_json).

GET /api/clone-snapshots/{snapshot_id}

Return one snapshot including the parsed per-table list:

{
"snapshot_id": "...",
"name": "pre-migration",
"table_count": 611,
"tables": [
{ "schema": "bronze", "table": "orders", "version": 42, "size_bytes": 1073741824 },
{ "schema": "bronze", "table": "customers", "version": 8, "size_bytes": 268435456 }
]
}

Returns 404 if snapshot_id is not found.

DELETE /api/clone-snapshots/{snapshot_id}

Remove a snapshot row. Idempotent — returns {snapshot_id, deleted: true} whether or not the row existed.


Analysis

Diff, validate, stats, search, profile, cost estimation, storage metrics, table maintenance, and metadata export.

POST /api/diff

Compare two catalogs at the object level. Returns missing, extra, and matching schemas/tables/views.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip

Example request:

{
"source_catalog": "prod",
"destination_catalog": "prod_clone"
}

Example response:

{
"missing_schemas": ["analytics"],
"extra_schemas": [],
"matching_schemas": ["sales", "hr"],
"missing_tables": ["sales.orders_v2"],
"extra_tables": [],
"matching_tables": ["sales.orders", "hr.employees"]
}

POST /api/compare

Deep column-level comparison of two catalogs. Compares column names, data types, nullability, and ordering.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip

POST /api/validate

Validate a clone by comparing row counts and optionally checksums.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
use_checksumbooleanNofalseCompare hash-based checksums
max_workersintegerNo4Parallel thread count

Example request:

curl -X POST http://localhost:8080/api/validate \
-H "Content-Type: application/json" \
-d '{"source_catalog": "prod", "destination_catalog": "prod_clone", "use_checksum": true}'

POST /api/schema-drift

Detect schema drift between two catalogs. Identifies added, removed, and modified columns.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip

POST /api/stats

Get catalog statistics -- sizes, row counts, file counts, and top tables.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to analyze
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip

Example request:

{"source_catalog": "prod"}

POST /api/search

Search for tables and columns matching a regex pattern.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to search
patternstringYesRegex pattern to match
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
search_columnsbooleanNofalseAlso search column names

Example request:

{"source_catalog": "prod", "pattern": ".*email.*", "search_columns": true}

POST /api/profile

Profile data quality across a catalog. Computes per-column statistics: null count, distinct count, min/max values.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to profile
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
max_workersintegerNo4Parallel thread count
output_pathstringNoSave results to file

POST /api/estimate

Estimate storage and compute costs for a clone operation.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to estimate
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
include_schemasstring[]NoOnly include these schemas
price_per_gbfloatNo0.023Storage price per GB

POST /api/storage-metrics

Analyze per-table storage breakdown (active, vacuumable, time-travel bytes).

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to analyze
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
schema_filterstringNoFilter to specific schema
table_filterstringNoFilter to specific table

POST /api/optimize

Run OPTIMIZE on selected tables to compact small files.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog containing tables
warehouse_idstringNoSQL warehouse ID
tablesarrayNoSpecific tables: [{"schema":"x","table":"y"}]
schema_filterstringNoFilter to a schema (when tables is omitted)
dry_runbooleanNofalsePreview without executing

POST /api/vacuum

Run VACUUM on selected tables to reclaim storage from old files.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog containing tables
warehouse_idstringNoSQL warehouse ID
tablesarrayNoSpecific tables: [{"schema":"x","table":"y"}]
schema_filterstringNoFilter to a schema (when tables is omitted)
retention_hoursintegerNo168Retention period in hours (default 7 days)
dry_runbooleanNofalsePreview without executing

POST /api/check-predictive-optimization

Check if Predictive Optimization is enabled for a catalog. When enabled, manual OPTIMIZE/VACUUM may be unnecessary.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to check
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip

POST /api/export

Export catalog metadata to CSV or JSON.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to export
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
formatstringNo"csv""csv" or "json"
output_pathstringNoCustom output file path

Example response:

{"output_path": "exports/prod_metadata.csv"}

POST /api/snapshot

Create a point-in-time metadata snapshot of a catalog. Useful for before/after clone comparison.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to snapshot
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
output_pathstringNoCustom output file path

GET /api/catalog-size-history

Per-catalog daily size snapshots over the last N days. Powers the storage-trend chart on the FinOps page. Reads from the <audit>.metrics.catalog_size_daily Delta table populated by the scheduled storage-metrics collector.

Query parameters:

ParameterTypeRequiredDefaultDescription
catalogsstringNo(all)Comma-separated list to restrict (e.g. ?catalogs=prod,prod_eu)
daysintegerNo30Look-back window (1–365)

Response:

{
"rows": [
{ "catalog": "prod", "date": "2026-04-01", "total_bytes": 1234567890123, "total_tables": 412 }
],
"days": 30
}

POST /api/permissions-audit

Bulk-audit GRANTs across a catalog and surface risky patterns. Queries <catalog>.information_schema.table_privileges and clusters findings into CRITICAL / HIGH / MEDIUM / LOW based on public-group membership, privilege blast radius, and (optional) PII overlay.

Request body:

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to audit
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
pii_intersectionbooleanNofalseWhen true, runs PII detection inline and escalates findings on PII-bearing tables

Response:

{
"audit_results": [
{ "risk_level": "CRITICAL", "principal": "account users", "table_fqn": "prod.sales.customers",
"privilege": "ALL", "is_public_group": true, "suggested_action": "Revoke ALL from public group" }
],
"summary": { "total_findings": 14, "critical_count": 2, "high_count": 4, "medium_count": 6, "low_count": 2 }
}

POST /api/diff-detail

Detailed cross-catalog diff combining presence/absence + column drift + size delta. Returns the object-level diff, a drift list of common tables with column or size differences, and a summary rollup for the headline cards on the diff-and-compare UI.

Request body:

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog to compare
destination_catalogstringYesDestination catalog to compare against
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip

Response:

{
"schemas": { "missing": [], "extra": [], "matching": ["sales", "hr"] },
"tables": { "missing": ["sales.orders_v2"], "extra": [], "matching": ["sales.orders", "hr.employees"] },
"drift": [
{ "table_fqn": "sales.orders", "source_columns": 12, "dest_columns": 11,
"added_columns": [], "removed_columns": ["legacy_flag"], "size_delta_bytes": -1024000 }
],
"summary": { "total_matching_tables": 2, "tables_with_drift": 1, "total_size_source_bytes": 0, "total_size_dest_bytes": 0 },
"drift_errors": []
}

POST /api/stale-scan

Scan a catalog (or several) for stale and orphan tables. Joins per-table stats with read activity from system.access.audit (90-day window by default) and classifies each table into HIGH / MEDIUM / LOW risk with suggested actions (OPTIMIZE, REVIEW_FOR_DROP, VACUUM_THEN_DROP, etc.). Powers the unused-tables surface on the FinOps page.

Request body:

FieldTypeRequiredDefaultDescription
source_catalogstringNoSingle-mode catalog
source_catalogsstring[]NoMulti-mode (parallel fan-out, max 3 concurrent). Mutually exclusive with source_catalog.
warehouse_idstringNoFrom configSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
days_thresholdintegerNo90Read-activity look-back window (1–365)
min_age_daysintegerNo7Minimum table age — skips recently created tables
min_size_bytesintegerNo0De-noise filter — drop findings smaller than this size
check_small_filesbooleanNofalseWhen true, runs DESCRIBE DETAIL enrichment to detect fragmentation (adds 1–3s per catalog)

Response:

{
"findings": [
{ "table_fqn": "prod.bronze.events_legacy", "catalog": "prod", "risk_level": "HIGH",
"last_read_days_ago": 180, "table_size_bytes": 2400000000,
"suggested_action": "VACUUM_THEN_DROP", "is_orphan": false, "has_small_files": false }
],
"summary": {
"total_tables_scanned": 412, "stale_count": 23, "orphan_count": 4,
"high_risk": 6, "medium_risk": 11, "low_risk": 6
},
"per_catalog": { "prod": { "total_scanned": 412, "stale_count": 23 } },
"errors": []
}

Notebooks

CRUD operations for SQL Notebooks in Data Lab. Notebooks are stored as JSON files on the server.

GET /api/notebooks

List all saved notebooks with basic metadata (id, title, cell count, updated date).

GET /api/notebooks/{id}

Get a single notebook by ID, including all cells.

POST /api/notebooks

Create a new notebook.

FieldTypeRequiredDescription
titlestringYesNotebook title
cellsobject[]YesArray of {id, type, content}

PUT /api/notebooks/{id}

Update an existing notebook's title and/or cells.

DELETE /api/notebooks/{id}

Delete a notebook by ID.

POST /api/notebooks/{id}/export

Export a notebook as a concatenated .sql file. Markdown cells become SQL comments.


Deep Profiling

Column-level data profiling with histograms and top-N value frequencies.

POST /api/profile-table

Deep-profile a single catalog table.

FieldTypeRequiredDefaultDescription
table_fqnstringYesThree-part name catalog.schema.table
warehouse_idstringNoConfigSQL warehouse ID
sample_limitintNo0Limit rows (0 = full table)
top_nintNo10Top N values for string cols
histogram_binsintNo20Histogram bucket count

Example response:

{
"table_fqn": "catalog.schema.table",
"row_count": 50000,
"profiled_at": "2026-03-31T10:00:00Z",
"columns": [
{
"column_name": "age",
"data_type": "INT",
"null_count": 150,
"null_pct": 0.3,
"distinct_count": 85,
"min": 18, "max": 99, "avg": 42.3,
"histogram": [{"bucket": 1, "freq": 120, "range_min": 18, "range_max": 22}, "..."],
"top_values": null
},
{
"column_name": "status",
"data_type": "STRING",
"null_count": 0,
"null_pct": 0,
"distinct_count": 4,
"min_length": 4, "max_length": 11, "avg_length": 6.8,
"histogram": null,
"top_values": [{"value": "active", "freq": 30000, "pct": 60.0}, "..."]
}
]
}

POST /api/profile-results

Deep-profile the results of an arbitrary SQL query. Wraps the SQL as a CTE to compute stats server-side without double execution.

FieldTypeRequiredDefaultDescription
sqlstringYesSQL query to profile
warehouse_idstringNoConfigSQL warehouse ID
top_nintNo10Top N values for string cols
histogram_binsintNo20Histogram bucket count

Config

Read, write, and compare clone configuration files.

GET /api/config

Load and return the current config.

ParameterTypeInRequiredDefaultDescription
pathstringqueryNoconfig/clone_config.yamlConfig file path
profilestringqueryNoConfig profile name

Example request:

curl http://localhost:8080/api/config

PUT /api/config

Save config YAML to disk.

FieldTypeRequiredDefaultDescription
yaml_contentstringYesFull YAML content
pathstringNoconfig/clone_config.yamlFile path to write

Example request:

curl -X PUT http://localhost:8080/api/config \
-H "Content-Type: application/json" \
-d '{"yaml_content": "source_catalog: prod\ndestination_catalog: prod_clone\n"}'

POST /api/config/diff

Compare two config files and return their differences.

FieldTypeRequiredDescription
file_astringYesPath to first config
file_bstringYesPath to second config

POST /api/config/audit

Save audit trail settings to config YAML.

FieldTypeRequiredDefaultDescription
catalogstringNo"clone_audit"Audit catalog name
schemastringNo"logs"Audit schema name

GET /api/config/profiles

List available config profiles.

ParameterTypeInRequiredDefaultDescription
pathstringqueryNoconfig/clone_config.yamlConfig file path

Example response:

{"profiles": ["dev", "staging", "prod"]}

PATCH /api/config/warehouse

Update the active SQL warehouse ID in the config file. Persisted across server restarts. The Settings page in the wizard calls this when the user picks a different warehouse from the dropdown.

Request body:

FieldTypeRequiredDescription
warehouse_idstringYesDatabricks SQL warehouse ID

Response:

{ "status": "saved", "sql_warehouse_id": "abcd1234efgh5678" }

PATCH /api/config/performance

Update performance tuning fields (max_workers, parallel_tables, max_parallel_queries). All fields optional — only the fields supplied in the body are updated; the rest stay at their current values.

Request body:

FieldTypeRequiredDescription
max_workersintegerNoSchemas processed in parallel
parallel_tablesintegerNoTables cloned in parallel within a schema
max_parallel_queriesintegerNoConcurrent SQL statements upper bound

Response:

{ "status": "saved" }

PATCH /api/config/pricing

Update storage pricing for cost calculations on the FinOps page.

Request body:

FieldTypeRequiredDescription
price_per_gbnumberNoCost per GB-month for managed storage
currencystringNoISO 4217 currency code (e.g. "USD", "GBP")

Response:

{ "status": "saved", "price_per_gb": 0.023, "currency": "USD" }

GET /api/config/streaming-limits

Read the configured form bounds for the /demo-data Streaming Events tab. Stored in config/streaming_limits.json (independent of clone_config.yaml — these are UX form bounds, not clone orchestration). Falls back to built-in defaults when the file has not yet been written.

Response:

{
"events_per_batch": {"default": 100, "min": 1, "max": 10000},
"interval_seconds": {"default": 5, "min": 0.1, "max": 300},
"total_duration_seconds": {"default": 60, "min": 1, "max": 3600}
}

The same shape is also exposed at GET /api/generate/demo-data/streaming/limits for the demo-data page; both endpoints read the same source. The config endpoint is what the Settings → Performance → Streaming Form Limits card uses.

PATCH /api/config/streaming-limits

Update the streaming-emit form bounds. Body keys are all optional — fields not in the body keep their current value, so a partial update (e.g. raising only events_per_batch.max) doesn't require resending the full shape.

Request body:

{
"events_per_batch": {"max": 50000},
"total_duration_seconds": {"default": 120}
}

Response:

{
"status": "saved",
"limits": {
"events_per_batch": {"default": 100, "min": 1, "max": 50000},
"interval_seconds": {"default": 5, "min": 0.1, "max": 300},
"total_duration_seconds": {"default": 120, "min": 1, "max": 3600}
}
}

Validation: per-field invariant min ≤ default ≤ max. The server rejects any update that violates this with a 400 and a descriptive error message — the file is never written into a state that would 422 every subsequent streaming request.

The mtime-based cache invalidates immediately so the next streaming form fetch picks up the new bounds without a 60-second wait.


Generate

Export clone configuration as Databricks Workflow JSON, Terraform HCL, or create a persistent Databricks Job.

POST /api/generate/workflow

Generate a Databricks Workflows job definition (JSON or YAML).

FieldTypeRequiredDefaultDescription
formatstringNo"json""json" or "yaml"
output_pathstringNoOutput file path
job_namestringNoWorkflow job name
cluster_idstringNoCluster ID to use
schedulestringNoCron schedule expression
notification_emailstringNoEmail for job notifications

Example request:

{
"format": "json",
"job_name": "nightly-clone",
"schedule": "0 0 2 * * ?"
}

Example response:

{
"output_path": "databricks_workflow.json",
"content": "{...}",
"format": "json"
}

POST /api/generate/terraform

Submit Terraform or Pulumi code generation as a background job.

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to generate IaC for
warehouse_idstringNoFrom configSQL warehouse ID
formatstringNo"terraform""terraform" or "pulumi"
output_pathstringNoOutput file path
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip

Example response:

{"job_id": "tf-abc123", "status": "queued", "message": "Terraform generation submitted"}

POST /api/generate/create-job

Create a persistent Databricks Job for scheduled catalog cloning.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
job_namestringNoDatabricks Job name
volumestringNoUC Volume path
schedulestringNoCron schedule expression
timezonestringNo"UTC"Schedule timezone
notification_emailsstring[]No[]Notification recipients
max_retriesintegerNo0Max retry attempts
timeoutintegerNo7200Timeout in seconds
tagsobjectNo{}Key-value tags for the job
update_job_idintegerNoExisting job ID to update
clone_typestringNo"DEEP""DEEP" or "SHALLOW"
load_typestringNo"FULL""FULL" or "INCREMENTAL"
max_workersintegerNo4Parallel thread count
parallel_tablesintegerNo1Tables to clone simultaneously
max_parallel_queriesintegerNo10Max concurrent SQL queries
max_rpsfloatNo0Rate limit (requests per second)
copy_permissionsbooleanNotrueCopy table permissions
copy_ownershipbooleanNotrueCopy table ownership
copy_tagsbooleanNotrueCopy UC tags
copy_propertiesbooleanNotrueCopy table properties
copy_securitybooleanNotrueCopy security settings
copy_constraintsbooleanNotrueCopy table constraints
copy_commentsbooleanNotrueCopy comments
enable_rollbackbooleanNofalseEnable rollback logging
validate_after_clonebooleanNofalseRun validation after clone
validate_checksumbooleanNofalseUse checksums for validation
force_reclonebooleanNofalseForce re-clone of existing tables
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
include_schemasstring[]No[]Only include these schemas
include_tables_regexstringNoRegex filter for table names
exclude_tables_regexstringNoRegex to exclude table names
order_by_sizestringNo"asc" or "desc"
as_of_timestampstringNoTime-travel timestamp
as_of_versionstringNoTime-travel Delta version

Example request:

{
"source_catalog": "prod",
"destination_catalog": "prod_clone",
"job_name": "nightly-clone",
"schedule": "0 0 2 * * ?",
"clone_type": "DEEP",
"notification_emails": ["team@company.com"]
}

POST /api/generate/demo-data

Generate a demo catalog with synthetic data across multiple industries.

FieldTypeDefaultDescription
catalog_namestringrequiredName of the catalog to create
industriesstring[]all 10Industries to generate
ownerstringnullSet as catalog owner
scale_factorfloat1.0Row multiplier (0.01=10M, 0.1=100M, 1.0=2B)
batch_sizeint5000000Rows per INSERT batch
max_workersint4Parallel SQL workers
storage_locationstringnullOptional managed location
warehouse_idstringnullOverride SQL warehouse
drop_existingboolfalseDrop existing catalog first
medallionbooltrueGenerate bronze/silver/gold schemas
create_functionsbooltrueGenerate UDFs (20 per industry)
create_volumesbooltrueGenerate volumes and sample files
start_datestring"2020-01-01"Start of generated date range (YYYY-MM-DD)
end_datestring"2025-01-01"End of generated date range (YYYY-MM-DD)
dest_catalogstringnullOptional destination catalog — auto-clones the generated catalog to this target

Example request:

{
"catalog_name": "demo_source",
"industries": ["healthcare", "financial", "retail"],
"scale_factor": 0.1,
"medallion": true
}

Example response:

{"job_id": "abc123", "status": "queued", "message": "Demo data generation submitted"}

DELETE /api/generate/demo-data/{catalog_name}

Remove a demo catalog and all its contents.

Example request:

curl -X DELETE http://localhost:8080/api/generate/demo-data/demo_source

Example response:

{"catalog": "demo_source", "status": "cleaned", "schemas_dropped": 45, "tables_dropped": 312}

GET /api/generate/demo-data/catalogs

List catalogs the caller can read, with metadata + a demo flag (used by the Manage Catalogs tab on /demo-data). For each catalog, queries <catalog>.information_schema.table_properties in parallel to detect tables tagged demo.generated_by = 'clone-xs'.

Query parameters:

ParameterTypeDefaultDescription
demo_onlyboolfalseWhen true, returns only catalogs with is_demo=true

Example response:

{
"catalogs": [
{
"name": "demo_source",
"owner": "viral@example.com",
"comment": "",
"created_at": "2026-04-30T14:22:01Z",
"is_demo": true,
"num_demo_tables": 312,
"num_schemas": 45,
"num_tables": 312,
"error": null
}
],
"demo_only": false,
"total": 1
}

Per-catalog probe failures (e.g. PERMISSION_DENIED on information_schema) surface as the error field on the row; the listing as a whole doesn't abort.

POST /api/generate/demo-data/streaming

Start an in-process streaming-emit job. The runner emits JSON event batches at interval_seconds cadence for total_duration_seconds to a UC Volume. See the Demo Data Generator guide for details on the 10 built-in profiles.

Request body:

FieldTypeDefaultDescription
catalogstring(required)Target catalog (created if missing)
schemastring(required)Target schema
volumestringevents_volumeUC Volume name (created if missing)
profilestring(required)One of: generic_sensor, industrial_machine, car_obd2, smart_meter, wearable_health, pos_terminal, wind_turbine, atm_transaction, server_metrics, clickstream
events_per_batchint100Events per file (1..10000)
interval_secondsfloat5.0Seconds between batches (0.1..300)
total_duration_secondsint60Total run time, capped at 1 hour (1..3600)
num_devicesint?profile defaultOverride the per-profile default device count
auto_create_bronzeboolfalseRun CREATE OR REFRESH STREAMING TABLE for the Bronze table
bronze_refresh_minutesint5Streaming-table refresh cadence (1..60)
warehouse_idstring?(config)Override the SQL warehouse

Returns: {job_id, status, message}. Poll /api/clone/{job_id} for live progress (events_emitted, files_written, current_batch_path).

POST /api/generate/demo-data/streaming/{job_id}/stop

Request a streaming-emit job to halt at its next tick. The runner sleeps in 0.5-second slices, so latency-to-stop is bounded regardless of interval_seconds.

GET /api/generate/demo-data/streaming/auto-loader-sql

Return the canonical CREATE OR REFRESH STREAMING TABLE … SQL the in-process emitter would run. Used by the UI's copy-to-clipboard panel so users running the SQL manually get the same DDL.

Query parameters: catalog, schema, profile, refresh_minutes (default 5), volume (default events_volume).

GET /api/generate/demo-data/streaming/limits

Return the configured form bounds for the Streaming Events tab. The /demo-data page fetches this on mount to drive the HTML min/max attrs and clamp logic for Events per batch, Interval (seconds), and Total duration (seconds).

Reads the same source as GET /api/config/streaming-limits — duplicated here as a focused endpoint so the demo-data page doesn't have to fetch and dig through the full config blob. Edit the values via the Settings page or via PATCH /api/config/streaming-limits.

Response:

{
"events_per_batch": {"default": 100, "min": 1, "max": 10000},
"interval_seconds": {"default": 5, "min": 0.1, "max": 300},
"total_duration_seconds": {"default": 60, "min": 1, "max": 3600}
}

POST /api/generate/demo-data/streaming/schedule

Generate a self-contained Python notebook in the user's workspace and create a Databricks Job that runs it on a Quartz cron. Unlike the in-process /streaming endpoint, the resulting Job runs on Databricks compute and survives Clone-Xs API restarts. The Job is tagged created_by=clone-xs, kind=streaming-emit, profile=<profile> so it shows up in GET /api/generate/clone-jobs.

Request body (extends StreamingEmissionRequest above with):

FieldTypeDefaultDescription
namestringautoJob name (clxs-stream-<profile>-<utc-iso> if empty)
schedule_quartz_cronstring0 */5 * * * ?Quartz cron (6 or 7 fields)
timezone_idstringUTCIANA timezone
notebook_pathstring?autoWorkspace path; default /Users/<me>/clxs/streaming_<profile>_<isoZ>
use_serverlessbooltrueUse Serverless compute; false falls back to Single-Node job cluster

Example response:

{
"job_id": 1234567890,
"run_url": "https://<workspace>/#job/1234567890",
"notebook_path": "/Users/me@example.com/clxs/streaming_generic_sensor_20260501T120000Z",
"schedule_quartz_cron": "0 */5 * * * ?",
"timezone_id": "UTC",
"tags": {"created_by": "clone-xs", "kind": "streaming-emit", "profile": "generic_sensor"}
}

Returns HTTP 500 with the SDK error if client.jobs.create fails (e.g., DBSQL Serverless not enabled, no CREATE JOB permission). The in-process Start path still works in that case — users can run the notebook manually from the workspace.


Management

Catalog management -- preflight checks, rollback, PII scan, sync, audit trail, compliance, templates, scheduling, multi-clone, lineage, impact analysis, preview, warehouse control, RBAC, plugins, and monitoring metrics.

POST /api/preflight

Run pre-flight checks before cloning (permissions, connectivity, catalog existence).

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
warehouse_idstringNoSQL warehouse ID
check_writebooleanNotrueTest write permissions

Example request:

{"source_catalog": "prod", "destination_catalog": "prod_clone"}

GET /api/rollback/logs

List available rollback logs. Queries the Delta audit table first and falls back to local JSON files if the Delta table is unavailable.

Example response:

[
{
"rollback_id": "rb-20260315-103000",
"log_file": "rollback_2026-03-15_10-30-00.json",
"table_versions": {"sales.orders": 12, "sales.customers": 8},
"restore_mode": "RESTORE",
"timestamp": "2026-03-15T10:30:00Z"
}
]

POST /api/rollback

Rollback a previous clone operation using a rollback log.

FieldTypeRequiredDefaultDescription
log_filestringYesRollback log file name
warehouse_idstringNoSQL warehouse ID
drop_catalogbooleanNofalseDrop entire destination catalog

POST /api/pii-scan

Scan a catalog for PII columns (email, SSN, phone, etc.).

FieldTypeRequiredDefaultDescription
source_catalogstringYesCatalog to scan
warehouse_idstringNoSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
sample_databooleanNofalseSample actual data values
max_workersintegerNo4Parallel thread count

POST /api/sync

Submit a catalog sync as a background job. Syncs schema/table structure between source and destination.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
warehouse_idstringNoSQL warehouse ID
exclude_schemasstring[]No["information_schema", "default"]Schemas to skip
dry_runbooleanNofalsePreview without executing
drop_extrabooleanNofalseDrop extra objects in dest

Example response:

{"job_id": "sync-abc123", "status": "queued", "message": "Sync job submitted"}

GET /api/catalogs

List all Unity Catalog catalogs in the workspace.

Example response:

["prod", "staging", "dev", "sandbox"]

GET /api/catalogs/{catalog}/schemas

List schemas in a catalog (excludes information_schema and default).

ParameterTypeInRequiredDescription
catalogstringpathYesCatalog name

GET /api/catalogs/{catalog}/info

Catalog metadata via DESCRIBE CATALOG EXTENDED — owner, comment, storage root. Used by the Catalog Explorer page header and the clone wizard's catalog-info popovers.

ParameterTypeInRequiredDescription
catalogstringpathYesCatalog name

Response:

{
"name": "prod",
"storage_root": "s3://my-bucket/managed/prod",
"owner": "data-team@example.com",
"comment": "Production catalog"
}

GET /api/catalogs/{catalog}/{schema}/tables

List tables in a schema.

ParameterTypeInRequiredDescription
catalogstringpathYesCatalog name
schemastringpathYesSchema name

GET /api/catalogs/{catalog}/{schema}/objects

List every cloneable object in a schema: tables, views, functions, and volumes. Used by the UI Scope Picker to render the object tree. SDK-based — no SQL warehouse required.

ParameterTypeInRequiredDescription
catalogstringpathYesCatalog name
schemastringpathYesSchema name

Example response:

{
"tables": ["orders", "customers", "line_items"],
"views": ["v_active_customers", "v_monthly_revenue"],
"functions": ["calculate_discount"],
"volumes": ["raw_uploads", "exports"]
}

GET /api/catalogs/{catalog}/{schema}/{table}/info

Get table metadata (owner, type, storage location, properties, columns) via the Databricks SDK.

ParameterTypeInRequiredDescription
catalogstringpathYesCatalog name
schemastringpathYesSchema name
tablestringpathYesTable name

Example response:

{
"name": "orders",
"catalog": "prod",
"schema": "sales",
"table_type": "MANAGED",
"owner": "data-team",
"storage_location": "dbfs:/user/hive/warehouse/prod.db/sales/orders",
"columns": [
{"name": "order_id", "type": "BIGINT", "nullable": false},
{"name": "customer_id", "type": "BIGINT", "nullable": true}
],
"properties": {"delta.minReaderVersion": "1"}
}

GET /api/audit

Get clone audit trail entries from Unity Catalog Delta tables.

Example response:

[
{
"job_id": "a1b2c3d4",
"source_catalog": "prod",
"destination_catalog": "prod_clone",
"status": "completed",
"completed_at": "2025-01-15T10:45:00Z"
}
]

POST /api/audit/init

Initialize audit and run log Delta tables in Unity Catalog.

FieldTypeRequiredDefaultDescription
warehouse_idstringNoSQL warehouse ID
catalogstringNo"clone_audit"Audit catalog name
schemastringNo"logs"Audit schema name

Example response:

{
"status": "ok",
"tables_created": [
"clone_audit.logs.run_logs",
"clone_audit.logs.clone_operations",
"clone_audit.metrics.clone_metrics"
],
"schemas": { "..." : "..." }
}

POST /api/audit/describe

Describe the schema of audit tables.

FieldTypeRequiredDefaultDescription
catalogstringNo"clone_audit"Audit catalog name
schemastringNo"logs"Audit schema name

GET /api/audit/{job_id}/logs

Get full run log detail (including log lines) for a specific job from Delta.

ParameterTypeInRequiredDescription
job_idstringpathYesJob ID

POST /api/compliance

Generate a compliance report for a catalog.

FieldTypeRequiredDefaultDescription
catalogstringNoCatalog to audit
report_typestringNo"data_governance"Type of compliance report

GET /api/compliance/frameworks

List supported compliance frameworks (SOC2, GDPR, HIPAA, CCPA, DORA, etc.) with the most recent assessment score per framework. Backs the framework-grid on the Compliance page.

Response:

[
{ "id": "soc2", "name": "SOC 2 Type II", "version": "2017",
"control_count": 12, "score": 0.85, "last_assessed": "2026-05-02T09:15:00Z" },
{ "id": "gdpr", "name": "GDPR", "version": "2018",
"control_count": 8, "score": 0.78, "last_assessed": "2026-05-02T08:45:00Z" }
]

POST /api/compliance/frameworks/{framework_name}/assess

Run a fresh compliance assessment against all controls in the named framework. Collects evidence (RBAC audit, PII audit, audit-log retention, etc.) and computes a score. Persisted into <audit>.compliance.evidence so the trend endpoint can chart improvement over time.

ParameterTypeInRequiredDescription
framework_namestringpathYesOne of soc2, gdpr, hipaa, ccpa, dora

Response:

{
"framework_id": "soc2", "framework_name": "SOC 2 Type II",
"total_controls": 12, "met_controls": 10, "partial_controls": 1, "gap_controls": 1,
"score": 0.85, "assessed_at": "2026-05-02T10:35:12Z",
"evidence": [
{ "control_id": "CC6.1", "control_name": "Logical Access Controls",
"status": "met", "evidence_count": 5 }
]
}

GET /api/compliance/frameworks/{framework_name}/gaps

List controls in the framework where the most recent assessment found insufficient evidence. The triage list — Compliance page surfaces these as the day-to-day work queue.

Response:

[
{ "evidence_id": "evd-789", "framework_id": "gdpr", "control_id": "A.32.1",
"control_name": "Security of Processing", "evidence_type": "rbac_audit",
"evidence_summary": "Missing role assignments for sensitive schemas",
"evidence_count": 0, "status": "gap", "collected_at": "2026-05-02T10:00:00Z" }
]

GET /api/compliance/frameworks/{framework_name}/trend

Historical score trend for a framework. Powers the line chart on the Compliance page so improvement (or regression) is visible over weeks/months.

Response:

[
{ "score": 0.72, "assessed_at": "2026-04-25T09:00:00Z" },
{ "score": 0.78, "assessed_at": "2026-05-01T09:00:00Z" },
{ "score": 0.85, "assessed_at": "2026-05-02T10:35:12Z" }
]

GET /api/templates

List available clone templates (pre-configured clone profiles).

Example response:

[
{"name": "dev-refresh", "description": "Refresh dev from prod", "clone_type": "SHALLOW"}
]

GET /api/schedule

List scheduled clone jobs.

POST /api/schedule

Create a scheduled clone job.

FieldTypeRequiredDescription
(varies)objectYesSchedule configuration object

POST /api/multi-clone

Clone a source catalog to multiple destinations simultaneously.

FieldTypeRequiredDescription
source_catalogstringYesSource catalog
destinationsarrayYes[{"catalog": "clone_1"}, ...]
clone_typestringNo"DEEP" or "SHALLOW"

Example request:

{
"source_catalog": "prod",
"destinations": [{"catalog": "staging"}, {"catalog": "dev"}],
"clone_type": "DEEP"
}

Example response:

[
{"destination": "staging", "job_id": "mc-001", "status": "queued"},
{"destination": "dev", "job_id": "mc-002", "status": "queued"}
]

POST /api/lineage

Query lineage for a catalog or table.

FieldTypeRequiredDescription
catalogstringYesCatalog name
tablestringNoSpecific table (optional)

POST /api/impact

Analyze downstream impact of changes to a catalog, schema, or table.

FieldTypeRequiredDescription
catalogstringYesCatalog name
schemastringNoSchema name
tablestringNoTable name

POST /api/preview

Preview source vs destination data side by side.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
dest_catalogstringYesDestination catalog
schemastringYesSchema name
tablestringYesTable name
limitintegerNo50Max rows to preview

POST /api/warehouse/start

Start a SQL warehouse.

FieldTypeRequiredDescription
warehouse_idstringYesWarehouse ID

POST /api/warehouse/stop

Stop a SQL warehouse.

FieldTypeRequiredDescription
warehouse_idstringYesWarehouse ID

GET /api/rbac/policies

List RBAC policies.

POST /api/rbac/policies

Create an RBAC policy.

FieldTypeRequiredDescription
(varies)objectYesPolicy definition

GET /api/plugins

List available plugins.

POST /api/plugins/toggle

Enable or disable a plugin.

FieldTypeRequiredDefaultDescription
namestringYesPlugin name
enabledbooleanNotrueEnable or disable

GET /api/monitor/metrics

Get clone operation metrics from Delta tables (throughput, failure rates, duration trends).

GET /api/notifications

Returns recent clone events from Delta tables (completions, failures, TTL warnings). Events are sourced from run_logs and clone_operations Delta tables.

Example response:

{
"unread_count": 3,
"items": [
{
"type": "success",
"message": "Clone completed: prod -> prod_clone",
"timestamp": "2025-01-15T10:45:00Z",
"status": "completed",
"job_id": "a1b2c3d4"
}
]
}

GET /api/catalog-health

Returns per-catalog health scores based on recent operations (success rate, trend, skipped-table ratio).

Example response:

{
"catalogs": [
{
"catalog": "prod",
"total": 10,
"succeeded": 9,
"failed": 1,
"last_operation": "2025-01-15T10:45:00Z",
"score": 90
}
]
}

Monitor

Continuous monitoring -- compare source and destination catalogs in real-time.

POST /api/monitor

Run a single monitoring check between source and destination catalogs.

ParameterTypeInRequiredDefaultDescription
source_catalogstringqueryYesSource catalog
destination_catalogstringqueryYesDestination catalog
warehouse_idstringqueryNoSQL warehouse ID
check_driftbooleanqueryNotrueCheck for schema drift
check_countsbooleanqueryNofalseCheck row count mismatches

Example request:

curl -X POST "http://localhost:8080/api/monitor?source_catalog=prod&destination_catalog=prod_clone&check_drift=true"

Incremental

Incremental sync -- detect changed tables using Delta version history and sync only what changed.

POST /api/incremental/check

Find tables that have changed since the last sync.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
schema_namestringYesSchema to check
warehouse_idstringNoSQL warehouse ID
clone_typestringNo"DEEP"Clone type
dry_runbooleanNofalsePreview mode

Example response:

{
"schema": "sales",
"tables_needing_sync": 3,
"tables": ["orders", "line_items", "payments"]
}

POST /api/incremental/sync

Submit an incremental sync job (only syncs changed tables).

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
schema_namestringYesSchema to sync
warehouse_idstringNoSQL warehouse ID
clone_typestringNo"DEEP"Clone type
dry_runbooleanNofalsePreview mode
serverlessbooleanNofalseUse serverless compute
volumestringNoUC Volume path

Example response:

{"job_id": "inc-abc123", "status": "queued", "message": "Incremental sync job submitted"}

Sampling

Data sampling -- preview and compare source/destination table data side by side.

POST /api/sample

Get sample rows from a table.

FieldTypeRequiredDefaultDescription
catalogstringYesCatalog name
schema_namestringYesSchema name
table_namestringYesTable name
warehouse_idstringNoSQL warehouse ID
limitintegerNo10Number of rows

Example request:

{"catalog": "prod", "schema_name": "sales", "table_name": "orders", "limit": 5}

Example response:

{
"catalog": "prod",
"schema": "sales",
"table": "orders",
"rows": [{"order_id": 1, "amount": 99.99}, "..."]
}

POST /api/sample/compare

Compare sample rows between source and destination tables.

FieldTypeRequiredDefaultDescription
source_catalogstringYesSource catalog
destination_catalogstringYesDestination catalog
schema_namestringYesSchema name
table_namestringYesTable name
warehouse_idstringNoSQL warehouse ID
limitintegerNo5Number of rows
order_bystringNoColumn to order by

Dependencies

Dependency analysis -- map view and function dependencies, compute creation order for cloning.

POST /api/column-usage

Get column usage analytics for a catalog. Default (fast) mode uses information_schema.columns (< 2s). Set use_system_tables: true to query system.access.column_lineage for richer data. Set include_query_history: true to also query system.query.history. Returns graceful error instead of 500 when system tables are unavailable.

FieldTypeRequiredDefaultDescription
catalogstringYesCatalog name
schema_namestringNoFilter by schema
warehouse_idstringNoSQL warehouse ID
use_system_tablesbooleanNofalseUse system.access.column_lineage for usage data
include_query_historybooleanNofalseInclude query history analysis

Example response:

{
"catalog": "prod",
"columns": [
{"column": "customer_id", "table": "sales.orders", "usage_count": 1230},
{"column": "order_date", "table": "sales.orders", "usage_count": 980}
],
"source": "system.access.column_lineage",
"fallback": false
}

POST /api/dependencies/views

Get the view dependency graph for a schema. Returns graceful error instead of 500 when system tables are unavailable.

FieldTypeRequiredDescription
catalogstringYesCatalog name
schema_namestringYesSchema name
warehouse_idstringNoSQL warehouse ID

Example response:

{
"catalog": "prod",
"schema": "sales",
"dependencies": [
{"view": "daily_summary", "depends_on": ["orders", "line_items"]}
]
}

POST /api/dependencies/functions

Get the function dependency graph for a schema. Returns graceful error instead of 500 when system tables are unavailable.

FieldTypeRequiredDescription
catalogstringYesCatalog name
schema_namestringYesSchema name
warehouse_idstringNoSQL warehouse ID

POST /api/dependencies/order

Get topologically sorted creation order for views (ensures views are created after their dependencies). Returns graceful error instead of 500 when system tables are unavailable.

FieldTypeRequiredDescription
catalogstringYesCatalog name
schema_namestringYesSchema name
warehouse_idstringNoSQL warehouse ID

Example response:

{
"catalog": "prod",
"schema": "sales",
"creation_order": ["base_view", "mid_view", "top_view"]
}

Explorer

Endpoints powering the Explorer page's catalog browsing, UC object discovery, and table usage analytics.

GET /api/uc-objects

List all Unity Catalog workspace objects: External Locations, Storage Credentials, Connections, Registered Models (ML), Metastore info, Shares, and Recipients. Uses the Databricks SDK directly (no SQL warehouse required).

Example request:

curl http://localhost:8080/api/uc-objects \
-H "X-Databricks-Host: https://adb-123456.azuredatabricks.net" \
-H "X-Databricks-Token: dapi..."

Example response:

{
"external_locations": [
{"name": "my_location", "url": "abfss://container@storage.dfs.core.windows.net/path"}
],
"storage_credentials": [
{"name": "my_credential", "type": "AZURE_MANAGED_IDENTITY"}
],
"connections": [],
"registered_models": [
{"name": "fraud_model", "catalog": "ml", "schema": "models"}
],
"metastore": {"name": "main", "owner": "admin"},
"shares": [],
"recipients": []
}

POST /api/table-usage

Get the most frequently used tables in a catalog based on query frequency. Queries system.query.history for table access counts.

FieldTypeRequiredDescription
catalogstringYesCatalog name
schema_namestringNoFilter by schema
warehouse_idstringNoSQL warehouse ID
limitintegerNoMax tables to return (default 10)

Example request:

curl -X POST http://localhost:8080/api/table-usage \
-H "Content-Type: application/json" \
-d '{"catalog": "prod", "limit": 5}'

Example response:

{
"catalog": "prod",
"tables": [
{"table": "sales.orders", "query_count": 4521, "last_accessed": "2026-03-17T10:30:00Z"},
{"table": "sales.customers", "query_count": 3102, "last_accessed": "2026-03-17T09:15:00Z"},
{"table": "inventory.products", "query_count": 1890, "last_accessed": "2026-03-16T22:45:00Z"}
]
}

Cache Management

Clone-Xs caches Databricks SDK metadata (schemas, tables, views, functions, volumes, table info, catalog info) in a process-local, in-memory cache with a configurable TTL (default: 5 minutes). This eliminates redundant API calls during operations like diff, stats, and validation that query the same metadata repeatedly.

The cache is automatically invalidated after clone, sync, and incremental sync jobs complete. You can also manage it manually via these endpoints.

GET /api/cache/stats

Returns cache hit/miss counters and current size.

Example request:

curl http://localhost:8080/api/cache/stats

Example response:

{
"hits": 42,
"misses": 15,
"size": 15,
"ttl_seconds": 300.0
}

POST /api/cache/clear

Clear all cached metadata entries and reset counters.

Example request:

curl -X POST http://localhost:8080/api/cache/clear

Example response:

{
"status": "cleared"
}

POST /api/cache/invalidate

Invalidate cached metadata for a specific catalog. Useful after making changes to a catalog outside of Clone-Xs.

Request body:

FieldTypeRequiredDescription
catalogstringYesCatalog name

Example request:

curl -X POST http://localhost:8080/api/cache/invalidate \
-H "Content-Type: application/json" \
-d '{"catalog": "prod"}'

Example response:

{
"status": "invalidated",
"catalog": "prod",
"entries_removed": 8
}

Delta Live Tables (DLT)

Discover, clone, monitor, and manage DLT pipelines. All endpoints under /api/dlt/.

GET /api/dlt/pipelines

List all DLT pipelines with state, health, and creator.

Query parameters: filter (optional pipeline name filter)

GET /api/dlt/pipelines/{pipeline_id}

Get full pipeline configuration, libraries, clusters, and status.

POST /api/dlt/pipelines/{pipeline_id}/trigger

Trigger a pipeline run.

Request body: { "full_refresh": false }

POST /api/dlt/pipelines/{pipeline_id}/stop

Stop a running pipeline.

POST /api/dlt/pipelines/{pipeline_id}/clone

Clone pipeline definition within the same workspace.

Request body: { "new_name": "My Clone", "dry_run": false }

POST /api/dlt/pipelines/{pipeline_id}/clone-to-workspace

Clone pipeline definition to a different Databricks workspace.

Request body:

{
"new_name": "Pipeline DR Copy",
"dest_host": "https://adb-xxx.azuredatabricks.net",
"dest_token": "dapi...",
"dry_run": false
}

For pipelines without notebook libraries (serverless/SQL), a placeholder notebook is created automatically in the destination workspace.

GET /api/dlt/pipelines/{pipeline_id}/events

Get pipeline event log. Query: max_events (default 100)

GET /api/dlt/pipelines/{pipeline_id}/updates

Get pipeline run/update history.

GET /api/dlt/pipelines/{pipeline_id}/lineage

Map DLT datasets to Unity Catalog tables in the pipeline's target schema.

GET /api/dlt/pipelines/{pipeline_id}/expectations

Query DLT expectation results from system.lakeflow.pipeline_events. Query: days (default 7)

GET /api/dlt/dashboard

Full DLT health dashboard: pipeline states, health, recent events.


RTBF (Right to Be Forgotten)

GDPR Article 17 erasure workflow. All endpoints are under /api/rtbf/.

POST /api/rtbf/requests

Submit a new erasure request.

Request body:

{
"subject_type": "email",
"subject_value": "user@example.com",
"requester_email": "dpo@company.com",
"requester_name": "Data Protection Officer",
"legal_basis": "GDPR Article 17(1)(a) - Consent withdrawn",
"strategy": "delete",
"grace_period_days": 0,
"notes": "Customer requested account deletion"
}

Parameters:

FieldRequiredDefaultDescription
subject_typeYesemailIdentifier type: email, customer_id, ssn, phone, name, national_id, passport, credit_card, custom
subject_valueYesThe identifier value to search for and delete
subject_columnNoRequired when subject_type is custom
requester_emailYesEmail of person requesting erasure
requester_nameYesName of person requesting erasure
legal_basisNoGDPR Art. 17(1)(a)Legal basis for the erasure
strategyNodeleteDeletion strategy: delete, anonymize, pseudonymize
scope_catalogsNoallLimit search to specific catalogs
grace_period_daysNo0Days to wait before execution
notesNoAdditional context

GET /api/rtbf/requests

List requests with optional filters.

Query parameters: status, from_date, to_date, limit (default 50)

GET /api/rtbf/requests/{request_id}

Get full details for a single request.

PUT /api/rtbf/requests/{request_id}/status

Update request status (approve, hold, cancel).

Request body: { "status": "approved" | "on_hold" | "cancelled", "reason": "optional" }

POST /api/rtbf/requests/{request_id}/discover

Run subject discovery across all cloned catalogs (async job).

Request body: { "subject_value": "user@example.com" }

GET /api/rtbf/requests/{request_id}/impact

Get impact analysis — affected catalogs, schemas, tables, row counts.

POST /api/rtbf/requests/{request_id}/execute

Execute deletion/anonymization (async job). Supports dry-run.

Request body: { "subject_value": "user@example.com", "strategy": "delete", "dry_run": false }

POST /api/rtbf/requests/{request_id}/vacuum

VACUUM all affected tables to physically remove Delta history (async job).

Request body: { "retention_hours": 0 }

POST /api/rtbf/requests/{request_id}/verify

Verify deletion by re-querying all affected tables (async job).

Request body: { "subject_value": "user@example.com" }

POST /api/rtbf/requests/{request_id}/certificate

Generate a GDPR-compliant deletion certificate (HTML + JSON).

GET /api/rtbf/requests/{request_id}/certificate

Get the latest certificate for a request.

GET /api/rtbf/requests/{request_id}/certificate/download

Download certificate as a file.

Query parameters: format=html (default) or format=json

GET /api/rtbf/requests/{request_id}/actions

Get all actions (discover, delete, vacuum, verify) for a request.

GET /api/rtbf/requests/overdue

Get requests that have passed their GDPR 30-day deadline.

GET /api/rtbf/requests/approaching-deadline

Get requests approaching their deadline.

Query parameters: warn_days (default 5)

GET /api/rtbf/dashboard

Dashboard summary: total, pending, in_progress, completed, overdue, avg_processing_days.


DSAR (Data Subject Access Request)

GDPR Article 15 right of access and data portability — discover, export, and report on every row across cloned catalogs that matches a data subject. All endpoints under /api/dsar/.

POST /api/dsar/requests

Submit a new DSAR request to retrieve all personal data for a subject.

Request body:

FieldTypeRequiredDescription
subject_typestringYesOne of email, customer_id, ssn, phone, name, national_id, passport, credit_card, custom
subject_valuestringYesThe identifier value to search for
subject_columnstringIf subject_type=customColumn name to search on
requester_emailstringYesEmail of the requestor / DPO
requester_namestringYesName of the requestor
legal_basisstringNoDefault "GDPR Article 15 - Right of access"
export_formatstringNocsv (default), json, or parquet
scope_catalogsstring[]NoCatalogs to search (default: all)
notesstringNoAudit-trail notes

Response: { "request_id": "…", "status": "submitted", "deadline": "2026-06-02" }

GET /api/dsar/requests

List DSAR requests with optional status filter.

Query parameters: status (submitted/approved/cancelled/delivered/completed), limit (default 50).

GET /api/dsar/requests/{request_id}

Get full details for a specific DSAR request.

GET /api/dsar/requests/{request_id}/actions

Audit trail of all actions taken on a DSAR request.

GET /api/dsar/requests/overdue

DSAR requests that have exceeded their GDPR deadline.

GET /api/dsar/dashboard

Summary stats: total, pending, overdue, completion rate, avg days to complete.

PUT /api/dsar/requests/{request_id}/status

Update DSAR request status — approve, cancel, deliver, complete. Body: { "status": "approved", "reason": "…" } (reason required for cancel).

POST /api/dsar/requests/{request_id}/discover

Run async discovery to identify every table/row matching the subject across cloned catalogs. Body: { "subject_value": "…", "export_format": "csv" }. Returns a job_id; poll job status separately.

POST /api/dsar/requests/{request_id}/export

Export all subject data in the requested format (async job).

POST /api/dsar/requests/{request_id}/report

Generate the GDPR-compliant data access report (HTML + JSON) with metadata about which tables were scanned.


Governance

Glossary, DQ rules, SLA monitoring, certifications, ODCS data contracts, and DQX-based data-quality engine. All endpoints under /api/governance/.

POST /api/governance/init

Initialize all governance Delta tables (Glossary, DQ Rules, SLA, ODCS, DQX, Reconciliation, Alerts).

POST /api/governance/glossary

Create a glossary term. Body: { name, description, domain, aliases, owner }.

GET /api/governance/glossary

List all glossary terms.

GET /api/governance/glossary/{term_id}

Retrieve a single term.

DELETE /api/governance/glossary/{term_id}

Delete a glossary term.

POST /api/governance/glossary/link

Link a glossary term to one or more table columns (FQNs). Body: { term_id, column_fqns: [...] }.

POST /api/governance/search

Global metadata search across catalogs/tables/columns. Body: { query, catalogs, search_type, limit }.

POST /api/governance/dq/rules

Create a DQ rule (rowcount, null, uniqueness, custom SQL). Body: { table_fqn, rule_type, expression, severity, name }.

GET /api/governance/dq/rules

List DQ rules. Query: table_fqn, severity.

PUT /api/governance/dq/rules/{rule_id}

Update a DQ rule (name, expression, severity).

DELETE /api/governance/dq/rules/{rule_id}

Delete a DQ rule.

POST /api/governance/dq/cross-table-check

Run a cross-table consistency check. Body: { check_type, source_table, dest_table, predicate }.

POST /api/governance/dq/run

Execute one or more DQ rules. Body: { rule_ids, catalog, table_fqn }.

GET /api/governance/dq/results

Latest DQ rule execution results. Query: table_fqn.

GET /api/governance/dq/history

Historical DQ results. Query: rule_id, days (default 30).

POST /api/governance/certifications

Create a certification record. Body: { table_fqn, certifier, expiry_date, notes }.

GET /api/governance/certifications

List all certifications.

POST /api/governance/certifications/approve

Approve or reject a pending certification. Body: { cert_id, action: "approve"|"reject", reviewer_notes }.

POST /api/governance/sla/rules

Create an SLA rule. Body: { table_fqn, metric_type, threshold, severity }.

GET /api/governance/sla/rules

List all SLA rules.

POST /api/governance/sla/check

Run SLA compliance checks across all rules.

GET /api/governance/sla/status

Current SLA compliance status.

GET /api/governance/sla/compliance-trend

SLA compliance trend. Query: days (default 30).

DELETE /api/governance/sla/rules/{sla_id}

Delete an SLA rule.

POST /api/governance/odcs/contracts

Create an ODCS v3.1.0 data contract.

GET /api/governance/odcs/contracts

List ODCS contracts. Query: domain, status, table_fqn.

GET /api/governance/odcs/contracts/{contract_id}

Retrieve a single ODCS contract with full document.

PUT /api/governance/odcs/contracts/{contract_id}

Update an ODCS contract (partial fields).

DELETE /api/governance/odcs/contracts/{contract_id}

Delete an ODCS contract.

POST /api/governance/odcs/contracts/{contract_id}/validate

Run full ODCS validation against all 11 sections.

GET /api/governance/odcs/contracts/{contract_id}/versions

Version history for an ODCS contract.

GET /api/governance/odcs/contracts/{contract_id}/versions/{version}

Retrieve a specific version of a contract.

POST /api/governance/odcs/import

Import a contract from ODCS YAML. Body: { yaml_content }.

GET /api/governance/odcs/contracts/{contract_id}/export

Export an ODCS contract as YAML (text/yaml).

GET /api/governance/odcs/prefill

Pre-filled server config from clone_config.yaml for new ODCS contract creation.

POST /api/governance/odcs/contracts/{contract_id}/map-dq

Map existing DQ rules to the contract's quality section.

POST /api/governance/odcs/contracts/{contract_id}/map-sla

Map existing SLA rules to the contract's slaProperties section.

POST /api/governance/odcs/migrate

Migrate legacy data contracts to ODCS v3.1.0.

POST /api/governance/odcs/contracts/{contract_id}/dqx-validate

Run DQX-based DataFrame validation for the contract's tables.

POST /api/governance/odcs/generate

Auto-generate an ODCS contract by introspecting a UC table. Body: { table_fqn, auto_save }.

POST /api/governance/odcs/generate-schema

Auto-generate ODCS contracts for every table in a schema.

POST /api/governance/odcs/generate-catalog

Auto-generate ODCS contracts for every table in a catalog.

GET /api/governance/dqx/spark-status

Spark session status for the DQX engine.

POST /api/governance/dqx/spark-configure

Configure Spark session — cluster_id or serverless: true.

GET /api/governance/dqx/dashboard

DQX dashboard summary: total checks, pass rate, latest runs.

GET /api/governance/dqx/functions

List available DQX check functions (built-in validations).

POST /api/governance/dqx/profile

Profile a table with DQX Profiler and optionally auto-generate checks. Body: { table_fqn, auto_generate_checks }.

POST /api/governance/dqx/profile-schema

Profile every table in a schema and auto-generate checks.

POST /api/governance/dqx/profile-catalog

Profile every table in a catalog and auto-generate checks.

POST /api/governance/dqx/profile-stream

Server-Sent Events stream of live profiling progress (text/event-stream).

POST /api/governance/dqx/checks

Create a DQX check manually. Body: { table_fqn, check_type, name, arguments, criticality }.

GET /api/governance/dqx/checks

List DQX checks. Query: table_fqn.

DELETE /api/governance/dqx/checks/{check_id}

Delete a DQX check.

POST /api/governance/dqx/checks/delete-bulk

Bulk-delete DQX checks. Body: { check_ids: [...] } or { table_fqn, delete_all: true }.

POST /api/governance/dqx/clear-all

Clear ALL DQX data — checks, profiles, run results, definitions.

POST /api/governance/dqx/checks/{check_id}/toggle

Enable / disable a DQX check. Body: { enabled: true }.

PUT /api/governance/dqx/checks/{check_id}

Update a DQX check (name, criticality, arguments, filter).

POST /api/governance/dqx/run

Execute DQX checks on a table. Body: { table_fqn, check_ids }.

GET /api/governance/dqx/results

DQX run results. Query: table_fqn, limit (default 50).

POST /api/governance/dqx/run-all

Run DQX checks across every monitored table.

GET /api/governance/dqx/checks/export

Export DQX checks as YAML. Query: table_fqn.

POST /api/governance/dqx/checks/import

Import DQX checks from YAML. Body: { table_fqn, yaml_content }.

POST /api/governance/dqx/checks/save-to-delta

Save DQX checks to a user-specified Delta table. Body: { target_table, table_fqn }.

GET /api/governance/dqx/checks/audit-log

DQX check audit log — every change to checks. Query: check_id, table_fqn, limit.

GET /api/governance/dqx/profiles

List DQX profiles. Query: table_fqn.

POST /api/governance/dqx/profile-drift

Detect profile drift and recommend new/updated DQ checks. Body: { table_fqn }.

GET /api/governance/changes

Change history for governance entities. Query: entity_type, limit (default 100).


Data Quality

DQ observability — freshness monitoring, anomaly detection on metric streams, volume tracking, expectation suites, unified incidents, health scores, root-cause hints, downstream-impact, monitoring scheduler. All endpoints under /api/data-quality/.

GET /api/data-quality/freshness/{catalog}

Freshness check for all tables in a catalog. Flags tables not updated within max_stale_hours. Query: schema, max_stale_hours (default 24).

GET /api/data-quality/freshness/{catalog}/{schema}/{table}/history

Historical freshness snapshots for one table. Query: limit.

GET /api/data-quality/freshness/summary

Aggregate fresh/stale/unknown counts for the dashboard.

GET /api/data-quality/anomalies

Recent anomalies in DQ metrics. Query: limit, severity.

GET /api/data-quality/anomalies/metrics/{table_fqn}

Historical metric values with baseline bands. Query: metric_name, limit.

GET /api/data-quality/metrics/recent

Recent metric measurements. Query: limit.

POST /api/data-quality/anomalies/record

Record a metric measurement and auto-detect anomalies via z-score. Body: { table_fqn, column_name, metric_name, value }.

GET /api/data-quality/anomalies/system-tables

Scan Databricks system tables for anomalies — billing spikes, slow queries, cluster failures, storage growth. Query: days (default 7).

GET /api/data-quality/volume/{catalog}

Row counts for all tables in a catalog. Query: schema.

POST /api/data-quality/volume/snapshot

Take a volume snapshot and record as metrics. Body: { catalog, schema_name }.

GET /api/data-quality/volume/{catalog}/history

Historical row-count snapshots. Query: days (default 30).

GET /api/data-quality/suites

List expectation suites.

POST /api/data-quality/suites

Create an expectation suite. Body: { name, description, checks: [{ check_id, description }] }.

GET /api/data-quality/suites/{suite_id}

Get a single expectation suite.

DELETE /api/data-quality/suites/{suite_id}

Delete an expectation suite.

POST /api/data-quality/suites/{suite_id}/run

Execute every check in a suite.

GET /api/data-quality/incidents

Unified incident feed — failed DQ rules + stale tables + anomalies + reconciliation mismatches. Query: limit.

GET /api/data-quality/anomaly-settings

Current anomaly detection configuration.

PUT /api/data-quality/anomaly-settings

Update anomaly detection thresholds.

GET /api/data-quality/dqx-settings

Current DQX configuration.

PUT /api/data-quality/dqx-settings

Update DQX configuration.

GET /api/data-quality/root-cause/{table_fqn}

Look for correlated co-occurring anomalies, schema changes, freshness gaps, volume drops. Query: hours (default 24).

GET /api/data-quality/impact/{table_fqn}

When a DQ check fails, show downstream tables/views/jobs affected.

POST /api/data-quality/gate/evaluate

Evaluate a DQ quality gate before clone/sync. Body: { table_fqn, suite_id, min_pass_rate }.

POST /api/data-quality/segmented-run

Run DQ checks per segment (per region, per date). Body: { table_fqn, segment_column, check_ids }.

GET /api/data-quality/segment-results

Per-segment DQ results for drill-down. Query: run_id, table_fqn, limit.

GET /api/data-quality/failure-samples

Sample failing rows for a DQX run. Query: run_id, table_fqn, limit.

GET /api/data-quality/coverage/{catalog}

Which tables have DQ checks vs. which don't, with coverage %.

GET /api/data-quality/health-score/{catalog}

Aggregate DQ health score (0–100) from freshness + anomalies + reconciliation. Query: schema, max_stale_hours.

GET /api/data-quality/health/trend

Daily health scores for the trend chart. Query: days (default 7).

GET /api/data-quality/sla/compliance-trend

Daily SLA compliance trend. Query: days (default 30).

GET /api/data-quality/scorecard/{table_fqn}

Per-table quality scorecard aggregating completeness, freshness, schema stability, SLA compliance, anomalies.

GET /api/data-quality/monitoring/configs

List table monitoring configurations.

POST /api/data-quality/monitoring/configs

Create or update a monitoring config. Body: { table_fqn, metrics, frequency, auto_baseline, baseline_days, enabled }.

PUT /api/data-quality/monitoring/configs/{config_id}

Update an existing monitoring config.

DELETE /api/data-quality/monitoring/configs/{config_id}

Delete a monitoring config.

POST /api/data-quality/monitoring/configs/{config_id}/toggle

Toggle enabled/disabled for a monitoring config.

POST /api/data-quality/monitoring/bulk-add

Add multiple tables for monitoring at once. Body: { table_fqns: [...], metrics, frequency }.

POST /api/data-quality/monitoring/bulk-delete

Bulk-delete monitoring configs. Body: { config_ids: [...] }.

GET /api/data-quality/monitoring/discover/{catalog}

Discover tables for monitoring setup. Query: schema.

POST /api/data-quality/monitoring/run

Execute monitoring for every enabled config.

GET /api/data-quality/monitoring/scheduler

Scheduler status — enabled, frequency, last/next run.

POST /api/data-quality/monitoring/scheduler/enable

Enable the background scheduler. Query: frequency_minutes (1–1440, default 60).

POST /api/data-quality/monitoring/scheduler/disable

Disable the scheduler.

PUT /api/data-quality/monitoring/scheduler/frequency

Update scheduler frequency. Query: frequency_minutes.

POST /api/data-quality/monitoring/scheduler/run-now

Trigger an immediate monitoring run.

GET /api/data-quality/schedules

List scheduled DQ check runs.

POST /api/data-quality/schedules

Create a scheduled DQ run with cron. Body: { name, cron, schedule_type, table_fqn, suite_id, check_ids }.

DELETE /api/data-quality/schedules/{schedule_id}

Delete a DQ schedule.

POST /api/data-quality/schedules/{schedule_id}/pause

Pause a DQ schedule.

POST /api/data-quality/schedules/{schedule_id}/resume

Resume a paused DQ schedule.

POST /api/data-quality/schedules/{schedule_id}/run

Execute a DQ schedule immediately.


Reconciliation

Cross-metastore row-count, column-schema, and checksum reconciliation between source and destination catalogs. SQL or Spark execution, batch jobs with WebSocket progress streaming, alert rules, remediation SQL generation, cron-scheduled runs. All endpoints under /api/reconciliation/.

GET /api/reconciliation/spark-status

Check Spark session availability for reconciliation.

POST /api/reconciliation/spark-configure

Configure the Spark session — cluster_id or serverless: true.

POST /api/reconciliation/validate

Row-level reconciliation. Body: { source_catalog, destination_catalog, schema_name, table_name, exclude_schemas, use_checksum, max_workers, use_spark }.

POST /api/reconciliation/compare

Column-level reconciliation comparing schemas and optional checksums.

POST /api/reconciliation/profile

Column profiling and statistics for a catalog.

POST /api/reconciliation/preview

Preview a table pair before deep reconciliation — metadata, column-match status, sample rows.

POST /api/reconciliation/deep-validate

Full row-level reconciliation via Spark — classifies rows as matched / missing / extra / modified with column-level diffs. Body: { source_catalog, destination_catalog, schema_name, table_name, key_columns, include_columns, ignore_columns, sample_diffs, use_checksum, max_workers, ignore_nulls, ignore_case, ignore_whitespace, decimal_precision }.

GET /api/reconciliation/history

Past reconciliation runs. Query: limit, run_type (row-level/column-level/deep), source_catalog.

POST /api/reconciliation/compare-runs

Compare two reconciliation runs side-by-side. Body: { run_id_a, run_id_b }.

POST /api/reconciliation/execute-sql

Execute arbitrary SQL via Spark Connect or SQL warehouse. Body: { sql, use_spark, warehouse_id }.

GET /api/reconciliation/alerts/rules

List alert rules for reconciliation metrics.

POST /api/reconciliation/alerts/rules

Create an alert rule. Body: { name, metric, operator, threshold, severity, source_catalog, destination_catalog, notify_channels }.

DELETE /api/reconciliation/alerts/rules/{rule_id}

Delete an alert rule.

GET /api/reconciliation/alerts/history

Alert trigger history. Query: limit.

POST /api/reconciliation/remediate

Generate SQL statements to fix reconciliation mismatches. Body: { source_catalog, destination_catalog, schema_name, table_name, key_columns, fix_type }.

GET /api/reconciliation/schedules

List scheduled reconciliation jobs.

POST /api/reconciliation/schedules

Create a scheduled reconciliation job. Body: { name, source_catalog, destination_catalog, cron, schema_name, table_name, key_columns, comparison_options }.

DELETE /api/reconciliation/schedules/{schedule_id}

Delete a schedule.

POST /api/reconciliation/schedules/{schedule_id}/pause

Pause a reconciliation schedule.

POST /api/reconciliation/schedules/{schedule_id}/resume

Resume a paused schedule.

POST /api/reconciliation/batch-validate

Submit a batch row-level reconciliation job. Body: { source_catalog, destination_catalog, tables: [{schema_name, table_name}], use_checksum, max_workers, use_spark }. Returns { job_id, status: "queued" }.

GET /api/reconciliation/batch-validate/{job_id}

Get progress of a batch row-level job.

DELETE /api/reconciliation/batch-validate/{job_id}

Cancel a queued batch row-level job.

POST /api/reconciliation/batch-compare

Submit a batch column-level comparison job.

GET /api/reconciliation/batch-compare/{job_id}

Get progress of a batch column-level job.

DELETE /api/reconciliation/batch-compare/{job_id}

Cancel a queued batch column-level job.

POST /api/reconciliation/batch-deep-validate

Submit a batch deep reconciliation job.

GET /api/reconciliation/batch-deep-validate/{job_id}

Get progress of a batch deep reconciliation job.

DELETE /api/reconciliation/batch-deep-validate/{job_id}

Cancel a queued batch deep reconciliation job.

GET /api/reconciliation/history/{run_id}/details

Per-table details for a specific reconciliation run.

WebSocket /api/reconciliation/ws/{job_id}

Live batch reconciliation progress streaming. Client sends {"type":"ping"}; server broadcasts {"type":"progress", …} events and a final {"type":"complete", …} message.


Master Data Management (MDM)

Entity resolution, golden records, match-pair stewardship, hierarchies, and matching rules. All endpoints under /api/mdm/.

POST /api/mdm/init

Initialise MDM tables and schema.

GET /api/mdm/dashboard

Dashboard summary — entities, match pairs, stewardship queue metrics.

GET /api/mdm/entities

List golden records. Query: entity_type, status, limit.

GET /api/mdm/entities/{entity_id}

Retrieve a golden record and its source records.

POST /api/mdm/entities

Create a golden record. Body: { entity_type, display_name, attributes }.

PUT /api/mdm/entities/{entity_id}

Update a golden record.

DELETE /api/mdm/entities/{entity_id}

Delete a golden record.

POST /api/mdm/ingest

Ingest source records and link to entities. Body: { catalog, schema_name, table, entity_type, key_column, trust_score }.

POST /api/mdm/detect

Detect duplicate records via matching rules. Body: { entity_type, auto_merge_threshold, review_threshold }.

GET /api/mdm/pairs

List match-pair candidates (potential duplicates). Query: entity_type, status, limit.

POST /api/mdm/merge

Merge two records — one becomes the golden record. Body: { pair_id, strategy: "keep_a"|"keep_b"|"create_new" }.

POST /api/mdm/split

Split a golden record back into separate entities. Body: { entity_id }.

GET /api/mdm/rules

List matching rules. Query: entity_type.

POST /api/mdm/rules

Create a matching rule. Body: { entity_type, name, field, match_type: "exact"|"fuzzy"|"phonetic", weight, threshold, enabled }.

DELETE /api/mdm/rules/{rule_id}

Delete a matching rule.

GET /api/mdm/stewardship

List stewardship tasks. Query: status, priority, limit.

POST /api/mdm/stewardship/{task_id}/approve

Approve a stewardship task.

POST /api/mdm/stewardship/{task_id}/reject

Reject a stewardship task. Body: { reason }.

GET /api/mdm/hierarchies

List organisational hierarchies.

POST /api/mdm/hierarchies

Create a hierarchy. Body: { name, entity_type }.

GET /api/mdm/hierarchies/{hierarchy_id}

Retrieve a hierarchy and its nodes.

POST /api/mdm/hierarchies/{hierarchy_id}/nodes

Add a node. Body: { entity_id, label, parent_node_id, level }.


Alert routing

Smart rule-based alert distribution, deduplication, and digest automation. All endpoints under /api/alerts/.

GET /api/alerts/routing-rules

List all routing rules.

POST /api/alerts/routing-rules

Create a routing rule. Body: { name, table_pattern, severity_filter, event_type_filter, route_to_team, channel, channel_config }.

PUT /api/alerts/routing-rules/{rule_id}

Update a routing rule.

DELETE /api/alerts/routing-rules/{rule_id}

Delete a routing rule.

GET /api/alerts/inbox

Get the alert inbox. Query: status, severity.

POST /api/alerts/route

Route a new alert to matching rules. Body: { event_type, table_fqn, severity, title, message }.

POST /api/alerts/inbox/{alert_id}/acknowledge

Mark an alert as acknowledged.

POST /api/alerts/inbox/{alert_id}/resolve

Mark an alert as resolved.

POST /api/alerts/inbox/{alert_id}/snooze

Snooze an alert. Query: hours (default 4).

GET /api/alerts/analytics

Alert analytics and trends. Query: days (default 30).

GET /api/alerts/digests

List digest configurations.

POST /api/alerts/digests

Create a digest config. Body: { recipient, frequency, filters }.

DELETE /api/alerts/digests/{digest_id}

Delete a digest config.


FinOps

Cost visibility and optimisation intelligence via Databricks system tables and optional Azure Cost Management. All endpoints under /api/finops/.

GET /api/finops/billing

Query billing costs from system.billing.usage. Query: days (default 30, max 365).

GET /api/finops/warehouses

List SQL warehouses with state and config — flags warehouses missing auto_stop_enabled.

GET /api/finops/warehouse-events

Warehouse lifecycle events (start/stop/scale). Query: days.

GET /api/finops/clusters

List compute clusters with state and config.

GET /api/finops/node-utilization

Node CPU/memory utilisation trends. Query: days (default 7, max 90).

GET /api/finops/query-stats

Query performance stats from system.query.history. Query: days.

GET /api/finops/storage

Table sizes from information_schema. Query: catalog (required).

GET /api/finops/recommendations

Combined FinOps recommendations from optimisation engine + warehouses + utilisation. Query: catalog.

GET /api/finops/query-costs

Per-query cost attribution via hourly warehouse allocation. Query: days.

GET /api/finops/job-costs

Per-job cost from billing.usage. Query: days.

GET /api/finops/system-status

Which system tables are accessible — used by the FinOps page to gracefully disable surfaces when a system table isn't granted.

GET /api/finops/azure/status

Azure Cost Management configuration and session auth method.

GET /api/finops/azure/costs

Query Azure Cost Management for trends and service breakdown. Query: days.

POST /api/finops/azure/config

Save Azure subscription, resource group, tenant config. Body: { subscription_id, resource_group, tenant_id }.


System insights

Unified compute / storage / metadata health via system tables. All endpoints under /api/system-insights/.

POST /api/system-insights/billing

Billing usage by date and SKU. Body: { warehouse_id?, catalog?, days: 30 }.

POST /api/system-insights/optimization

Predictive optimisation recommendations (OPTIMIZE, VACUUM, ZORDER).

POST /api/system-insights/jobs

Job run timeline from system.lakeflow. Body: { days, job_name_filter? }.

POST /api/system-insights/summary

Unified summary from billing + optimisation + jobs + lineage + storage in one call.

POST /api/system-insights/warehouses

List SQL warehouses with state and configuration.

POST /api/system-insights/clusters

List clusters with state and recent events. Body: { max_events: 10 }.

POST /api/system-insights/pipelines

List DLT pipelines with state and recent events. Body: { max_events_per_pipeline: 10 }.

POST /api/system-insights/query-performance

Recent query execution performance. Body: { warehouse_id?, days: 30, max_results: 100 }.

POST /api/system-insights/metastore

Current metastore info and catalog/schema counts.

POST /api/system-insights/alerts

List all SQL alerts with current state.

POST /api/system-insights/table-usage

Table access patterns from audit logs. Body: { warehouse_id?, catalog?, days: 30 }.


Federation

Lakehouse Federation — manage federated connections (MySQL, PostgreSQL, Snowflake), list foreign catalogs and tables, migrate foreign tables to managed Delta. All endpoints under /api/federation/.

GET /api/federation/catalogs

List all foreign (federated) catalogs in the metastore.

GET /api/federation/connections

List all connections (MySQL, PostgreSQL, Snowflake, etc.).

GET /api/federation/connections/{name}

Export a connection's configuration (sensitive fields redacted).

POST /api/federation/connections/clone

Create a new connection from an exported definition. Body: { connection_name, new_name, credentials, dry_run }. Credentials must be supplied (redacted in exports).

POST /api/federation/tables

List tables in a foreign catalog. Body: { catalog, warehouse_id?, schema_filter? }.

POST /api/federation/migrate

Materialize a foreign table into a managed Delta table (CTAS). Body: { foreign_fqn, dest_fqn, warehouse_id?, dry_run }.


ML assets

Inventory and clone Databricks ML components — registered models, feature tables, vector search indexes, serving endpoints. All endpoints under /api/ml-assets/.

POST /api/ml-assets/list

List registered models, feature tables, vector indexes in a catalog. Body: { source_catalog, warehouse_id?, schemas? }.

POST /api/ml-assets/clone

Clone ML assets from source to destination catalog. Body: { source_catalog, destination_catalog, include_models, include_feature_tables, include_vector_indexes, include_serving_endpoints, copy_versions, clone_type, schemas, max_workers, dry_run }.

POST /api/ml-assets/models/list

List registered models in a catalog.

POST /api/ml-assets/vector-indexes/list

List vector search indexes in a catalog.

GET /api/ml-assets/serving-endpoints

List all model serving endpoints.

POST /api/ml-assets/serving-endpoints/export

Export a serving endpoint configuration.

POST /api/ml-assets/serving-endpoints/import

Create a serving endpoint from an exported config. Body: { config, dest_catalog, source_catalog, name_suffix, dry_run }.


AI

AI features powered by Anthropic API or Databricks Model Serving — narratives, NL clone parsing, DQ rule suggestions, PII remediation. Backend selected via X-Databricks-Model header. All endpoints under /api/ai/.

GET /api/ai/status

Check whether AI features are available.

POST /api/ai/summarize

Generate an AI narrative summary. Body: { context_type, data }.

POST /api/ai/clone-builder

Parse a natural-language clone request into structured config. Body: { query, available_catalogs }.

POST /api/ai/dq-suggestions

Suggest data quality rules from profiling results. Body: { profiling_results, table_name }.

POST /api/ai/pii-remediation

AI-powered PII remediation recommendations. Body: { scan_results }.


AI assistant

Natural-language SQL generation, execution with explanations, Genie integration, multi-turn chat. All endpoints under /api/ai-assistant/.

POST /api/ai-assistant/nl-to-sql

Convert natural language to SQL. Body: { question, catalog?, schema_name? }.

POST /api/ai-assistant/execute-nl

Convert NL to SQL, execute it, return results with AI explanation. Body: { question, catalog?, schema_name? }.

POST /api/ai-assistant/genie-query

Send a question to a Databricks Genie space. Body: { question, space_id }.

POST /api/ai-assistant/chat

Multi-turn chat about data. Body: { messages, catalog?, schema_name? }.


Data Product Marketplace

Publish, discover, and subscribe to curated data products with SLA guarantees and quality requirements. All endpoints under /api/data-products/.

GET /api/data-products/

List data products. Query: status, domain.

POST /api/data-products/

Create a data product. Body: { name, description, domain, owner_team, owner_email, tables, sla_guarantees, quality_requirements, tags }.

GET /api/data-products/{product_id}

Retrieve a data product.

PUT /api/data-products/{product_id}

Update product fields (any subset).

DELETE /api/data-products/{product_id}

Delete a product.

POST /api/data-products/{product_id}/publish

Publish to the marketplace, making it discoverable.

POST /api/data-products/{product_id}/subscribe

Subscribe a team. Body: { subscriber_team, subscriber_email, use_case, notification_prefs }.

GET /api/data-products/{product_id}/subscribers

List subscribers for a product.


Data Environment Manager

Provision ephemeral sandboxes with masking, cost budgets, TTL cleanup, and access grants. All endpoints under /api/environments/.

GET /api/environments/

List environments. Query: status.

POST /api/environments/

Create an ephemeral environment. Body: { name, source_catalog, tables, masking_profile, ttl_hours, cost_budget, clone_type, access_grants }.

GET /api/environments/{env_id}

Get environment details.

POST /api/environments/{env_id}/extend

Extend TTL by additional hours. Query/body: additional_hours.

DELETE /api/environments/{env_id}

Destroy an environment and its resources.

POST /api/environments/cleanup

Trigger manual cleanup of expired environments.

GET /api/environments/templates/list

List saved environment templates.

POST /api/environments/templates

Create a reusable template. Body: { name, description, config }.

DELETE /api/environments/templates/{template_id}

Delete a template.


Promotion Plans

Multi-hop catalog clones across environments (dev → staging → prod) with client-side hop sequencing. All endpoints under /api/promotions/.

GET /api/promotions/plans

List built-in promotion plans with their hop definitions.

GET /api/promotions/plans/{plan_key}

Retrieve a specific plan, including all hop steps.

POST /api/promotions/plans/{plan_key}/run

Submit the first hop of a plan; return all hops with assigned job IDs and statuses. Body: { prefix, warehouse_id, max_workers }. Response includes hops[] each with name, source_catalog, dest_catalog, job_id, status.


Delta Sharing

Manage shares, recipients, and table grants for secure cross-org data distribution. All endpoints under /api/delta-sharing/.

GET /api/delta-sharing/shares

List all Delta Sharing shares.

GET /api/delta-sharing/shares/{name}

Get details for a share including shared objects and recipient grants.

POST /api/delta-sharing/shares

Create a new share. Body: { name, comment }.

POST /api/delta-sharing/shares/grant

Add a table to a share. Body: { share_name, table_fqn, shared_as }.

POST /api/delta-sharing/shares/revoke

Remove a table from a share. Body: { share_name, table_fqn }.

POST /api/delta-sharing/shares/validate/{name}

Validate that all objects in a share are accessible.

GET /api/delta-sharing/recipients

List all recipients.

POST /api/delta-sharing/recipients

Create a new recipient. Body: { name, comment, authentication_type, sharing_code }.

POST /api/delta-sharing/recipients/grant

Grant SELECT access on a share to a recipient. Body: { share_name, recipient_name }.


Continuous Sync

Near-real-time streaming replication via Structured Streaming, with in-process stream lifecycle management. All endpoints under /api/continuous-sync/.

POST /api/continuous-sync/plan

Generate a streaming-job plan without submitting (preview/download). Body: { source_catalog, destination_catalog, tables?, schema_name?, trigger_ms, checkpoint_root? }.

POST /api/continuous-sync/start

Submit and start a streaming job. Same body as /plan. Returns StreamRecord with stream_id, run_id, status. Returns 200 even on submission failure so UI can render consistently.

GET /api/continuous-sync/streams

List all registered streams. Query: refresh (poll Databricks for fresh state).

GET /api/continuous-sync/streams/{stream_id}

Get current state for one stream (always polls Databricks).

POST /api/continuous-sync/streams/{stream_id}/stop

Stop a stream. Idempotent.

POST /api/continuous-sync/streams/{stream_id}/restart

Cancel and resubmit a stream with the same parameters (post-crash / schema-drift recovery).


Approval

Approval workflow for governed clone operations. All endpoints under /api/approvals/.

GET /api/approvals/pending

List all pending approval requests.

GET /api/approvals/{request_id}

Fetch one approval request by id (works for any status).

POST /api/approvals/{request_id}/approve

Approve a pending request. Idempotent on terminal states.

POST /api/approvals/{request_id}/deny

Deny a pending request. Body: { reason }.


Anomaly Correlation

Cross-metric anomaly correlation — group co-occurring anomalies and surface candidate root-cause tables. All endpoints under /api/anomalies/.

GET /api/anomalies/groups

Recent anomaly correlation groups.

GET /api/anomalies/groups/{group_id}

Detail for a correlation group.

POST /api/anomalies/correlate

Run correlation analysis. Query: time_window_minutes (default 120, min 10).

GET /api/anomalies/root-causes

Top root-cause tables across recent anomalies.


Trust Score

Composite trust scores per table — DQ + freshness + anomaly + schema stability + PII + lineage. All endpoints under /api/trust/.

GET /api/trust/scores/{catalog}

Trust scores for every table in a catalog.

GET /api/trust/scores/{catalog}/{schema}/{table}

Trust score for a specific table.

GET /api/trust/scores/{catalog}/{schema}/{table}/history

Trust score trend over time for one table.

POST /api/trust/compute/{catalog}

Compute trust scores for a catalog. Query: schema_filter.

GET /api/trust/config

Trust score dimension weights.

PUT /api/trust/config

Update dimension weights. Body: { dq, freshness, anomaly, schema_stability, pii, lineage } (defaults: 0.30 / 0.25 / 0.15 / 0.10 / 0.10 / 0.10).


Coverage

DQ coverage — which tables have checks vs. don't, ranked gaps. All endpoints under /api/coverage/.

GET /api/coverage/{catalog}

Coverage map for a catalog.

GET /api/coverage/{catalog}/summary

Aggregate coverage summary.

GET /api/coverage/{catalog}/gaps

Uncovered tables ranked by priority.

POST /api/coverage/{catalog}/compute

Compute a coverage snapshot. Query: schema_filter.


Cost Of Poor Quality (COPQ)

Quantify business cost of DQ failures — engineer time, re-runs, SLA breaches, downstream disruption. All endpoints under /api/copq/.

GET /api/copq/summary

COPQ summary with breakdown. Query: days (default 30).

GET /api/copq/by-table

COPQ ranked by table. Query: days.

GET /api/copq/trends

Weekly COPQ trends. Query: days (default 90, min 7).

GET /api/copq/config

Cost assumptions used for COPQ calculation.

PUT /api/copq/config

Update cost assumptions. Body: { hourly_engineer_cost (75.0), per_rerun_cost (25.0), sla_breach_penalty (500.0), downstream_disruption_cost (100.0), avg_responders_per_incident (2) }.

POST /api/copq/compute

Auto-compute COPQ events from DQ failures.


Notifications (preferences + webhooks)

User notification preferences and webhook configuration. All endpoints under /api/notifications/.

GET /api/notifications/preferences

Notification preferences and configured webhooks.

PUT /api/notifications/preferences

Save notification preferences.

GET /api/notifications/webhooks

List configured webhooks.

POST /api/notifications/webhooks

Add a webhook configuration.

DELETE /api/notifications/webhooks/{webhook_id}

Remove a webhook.

POST /api/notifications/webhooks/test

Send a test notification to a webhook.


Scheduled clones

Cron-scheduled clone / sync / incremental_sync jobs with optional Databricks-Job creation for workspace-side execution. All endpoints under /api/schedules/ (plural — distinct from the singular /api/schedule clone-side schedules).

GET /api/schedules

List all saved schedules (active + paused) with computed next_run.

POST /api/schedules

Create a schedule. Body: { name, source_catalog, destination_catalog, cron, clone_type, job_type ("clone"|"sync"|"incremental_sync"), template? }.

POST /api/schedules/{schedule_id}/pause

Pause a schedule (clears next_run).

POST /api/schedules/{schedule_id}/resume

Resume a paused schedule.

DELETE /api/schedules/{schedule_id}

Delete a schedule (idempotent).


Lakehouse Monitor

Clone Databricks Lakehouse Monitoring quality monitors between catalogs. All endpoints under /api/lakehouse-monitor/.

POST /api/lakehouse-monitor/list

List quality monitors in a catalog. Body: { source_catalog, warehouse_id?, schema_filter? }.

POST /api/lakehouse-monitor/clone

Clone monitor definitions from source to destination tables. Body: { source_catalog, destination_catalog, warehouse_id?, schema_filter?, dry_run }.

POST /api/lakehouse-monitor/compare

Compare monitor metrics between source and destination tables. Body: { source_table, destination_table, warehouse_id? }.


Observability

Unified observability dashboard combining freshness + SLA + DQ + anomaly signals. All endpoints under /api/observability/.

GET /api/observability/dashboard

Full dashboard — health score, summary, top issues, category breakdown.

GET /api/observability/health-score

Composite health score (0–100).

GET /api/observability/issues

Top issues across all observability categories.

GET /api/observability/trends/{metric}

Time-series sparkline data for one metric (freshness, sla, dq).

GET /api/observability/category-health

Per-category health breakdown with weights.


Schema Evolution

Detect schema drift between source and destination tables and apply ALTER TABLE statements to converge. All endpoints under /api/schema-evolution/.

POST /api/schema-evolution/detect

Compare source and destination schemas. Body: { source_catalog, destination_catalog, schema_name, table_name }.

POST /api/schema-evolution/apply

Apply detected changes as ALTER TABLE. Body: { destination_catalog, schema_name, table_name, changes, dry_run (default true), drop_removed (default false) }.

POST /api/schema-evolution/evolve-catalog

Detect + apply across every table in a catalog. Body: { source_catalog, destination_catalog, exclude_schemas, dry_run, drop_removed, max_workers }.


Clone Provenance

Cryptographic provenance — sign clone manifests with HMAC, verify signatures later. All endpoints under /api/clone-provenance/.

POST /api/clone-provenance/sign/{job_id}

Sign the manifest for a completed clone job by ID using HMAC.

POST /api/clone-provenance/sign

Sign an arbitrary manifest supplied by the caller (for external orchestrators). Body: { source_catalog, destination_catalog, config, result, job_id? }.

POST /api/clone-provenance/verify

Verify a previously-signed manifest envelope. Returns { valid, reason }.


Playbooks

Trigger-driven automation — run actions on events (DQ failure, schema drift, anomaly, etc.) with rate-limiting and execution history. All endpoints under /api/playbooks/.

GET /api/playbooks

List all playbooks.

POST /api/playbooks

Create a playbook. Body: { name, description, trigger_type, trigger_config, conditions, actions, max_executions_per_hour }.

GET /api/playbooks/templates

List playbook templates.

GET /api/playbooks/{playbook_id}

Get a playbook by ID.

PUT /api/playbooks/{playbook_id}

Update a playbook.

DELETE /api/playbooks/{playbook_id}

Delete a playbook.

POST /api/playbooks/{playbook_id}/execute

Execute a playbook on demand (bypasses triggers).

GET /api/playbooks/{playbook_id}/history

Playbook execution history.


Streaming Clone Generator

Generate DLT pipeline specs and notebook SQL to materialize MV / streaming-table data. All endpoints under /api/streaming-clone-generator/.

POST /api/streaming-clone-generator/generate

Generate a DLT pipeline spec + notebook SQL. Body: { source_catalog, destination_catalog, schema_name, advanced_tables, target_schema?, pipeline_name? }.


Pipeline (multi-step orchestrator)

Multi-step clone pipelines — chain clone, mask, validate, notify, vacuum into a single declarative job. All endpoints under /api/pipeline/.

POST /api/pipeline/pipelines

Create a pipeline. Body: { name, description, steps: [{ type, name, config, on_failure }] }.

GET /api/pipeline/pipelines

List pipelines (optionally templates only).

GET /api/pipeline/pipelines/{pipeline_id}

Get a pipeline by ID.

DELETE /api/pipeline/pipelines/{pipeline_id}

Delete a pipeline.

POST /api/pipeline/pipelines/{pipeline_id}/run

Run a pipeline (queued async). Returns job_id.

GET /api/pipeline/runs

List pipeline runs. Query: pipeline_id.

GET /api/pipeline/runs/{run_id}

Get run status.

POST /api/pipeline/runs/{run_id}/cancel

Cancel a pipeline run.

GET /api/pipeline/templates

List pipeline templates.

POST /api/pipeline/templates/{template_name}/create

Create a pipeline from a template with optional overrides.


Job Clone

Clone Databricks Jobs (workflows) within or across workspaces, with diff and backup/restore. All endpoints under /api/job-clone/.

GET /api/job-clone

List Databricks jobs. Query: name filter, limit.

GET /api/job-clone/{job_id}

Get job details by ID.

POST /api/job-clone/clone

Clone a job within the same workspace. Body: { job_id, new_name, overrides }.

POST /api/job-clone/clone-cross-workspace

Clone a job to a different workspace. Body: { job_id, dest_host, dest_token, new_name }.

POST /api/job-clone/diff

Compare two job definitions. Body: { job_id_a, job_id_b }.

POST /api/job-clone/backup

Backup job definitions. Body: { job_ids }.

POST /api/job-clone/restore

Restore from backup. Body: { definitions }.


Natural Language Rules

Parse natural-language descriptions into DQ rule configurations and generate English explanations of existing rules. All endpoints under /api/nl-rules/.

POST /api/nl-rules/from-natural-language

Parse a natural-language rule description into a structured DQ rule. Body: { text, table_fqn }.

POST /api/nl-rules/batch-parse

Parse multiple NL rules for one table. Body: { rules: [...], table_fqn }.

POST /api/nl-rules/explain

Generate an English explanation of a rule. Body: { rule }.