Skip to main content

FinOps & Cost Management

Cost dashboards, budgets, optimization recommendations, and the Cost of Poor Data Quality (COPQ) engine — all backed by Databricks system tables.

Overview

The FinOps portal at /finops/* aggregates billing and usage data from Databricks system tables (system.billing.usage, system.compute.warehouses, system.access.audit) into per-catalog and per-job rollups. There's no warehouse cost overhead from the dashboards themselves — they're cached for ten minutes via TanStack Query and served from /api/finops/*.

Source modules:

Pages

URLWhat it shows
/finopsOverview — total cost, top catalogs, recent budget breaches
/finops/billingTime-series billing pulled from system.billing.usage, sliceable by SKU / product / catalog
/finops/breakdownCost by purpose (clone vs sync vs reconciliation vs analysis vs other)
/finops/computePer-warehouse and per-cluster cost & utilization
/finops/query-costsTop-N queries by cost over the last N days
/finops/job-costsPer-job cost rollup with run history
/finops/warehousesIdle warehouses, undersized warehouses, autoscale recommendations
/finops/storagePer-catalog storage cost trend (uses catalog_size_history)
/finops/storage-optimizationTables flagged for OPTIMIZE / VACUUM / Predictive Optimization with $ savings estimates
/finops/recommendationsAggregated optimization opportunities
/finops/budgetsPer-catalog or per-team monthly budgets and breach history
/finops/trends30-day moving averages, MoM deltas, anomaly markers
/finops/copqCost of Poor Data Quality dashboard

Configuration

The dashboards read system tables directly — no extra config needed beyond ensuring the user / service principal has SELECT on system.billing.usage. For multi-cloud workspaces (Azure), set AZURE_SUBSCRIPTION_ID to also include subscription-level Cost Management data on the /finops/billing page.

price_per_gb (default 0.023, set in Settings → FinOps) drives the storage-cost calculations in the Storage Optimization and Budgets dashboards.

Budgets

Create a budget from /finops/budgets:

name: prod-monthly
period: monthly
category: total # total | databricks | storage | compute
amount_usd: 5000
alert_threshold_pct: 80 # warn when actual > 80% of budget

Budgets are stored in localStorage (browser-side); breach detection runs against /api/finops/billing whenever the page is open.

COPQ — Cost of Poor Data Quality

The COPQ engine quantifies the dollar cost of data-quality failures. Components:

Cost componentHow it's computed
Pipeline reruns# DQ failures × hourly_engineer_rate × avg_rerun_hours
SLA breach penaltyPer-incident penalty from your SLA contracts
Engineer triage time# incidents × hours_per_incident × hourly_engineer_rate
Downstream impactdirect_cost × downstream_multiplier (default 2.5×)

Configure the rates

curl -X POST $CLXS_HOST/api/copq/compute \
-d '{
"hourly_rate": 150,
"rerun_cost": 50,
"sla_penalty": 500,
"downstream_multiplier": 2.5
}'

Defaults live in src/copq.py.

View the result

GET /api/copq/summary, GET /api/copq/by-table, GET /api/copq/trends — these power the /finops/copq dashboard. Trend chart shows weekly COPQ over the last 12 weeks; you can compare a "before remediation playbook" vs "after" line by tagging incidents with playbook_run_id.

Storage Optimization

/finops/storage-optimization reads system.compute.warehouses and information_schema.table_storage to surface tables that would benefit from:

  • OPTIMIZE — many small files, no recent compaction
  • VACUUM — deleted-but-retained data older than retention
  • Predictive Optimization — eligible tables not yet enrolled

Bulk-select tables and submit a job — Clone-Xs creates a single Databricks Job to run the maintenance, with progress streamed back via the job manager.