Storage Metrics

Analyze per-table storage breakdown using Databricks' ANALYZE TABLE ... COMPUTE STORAGE METRICS command. This feature helps platform admins identify tables with significant reclaimable storage, understand why total storage differs from active table size, and optimize costs.

Requires Databricks Runtime 18.0+.

What It Measures

The command returns four storage categories per table:

Category	Description
Total	Full storage footprint including delta log, active data, vacuumable files, and time-travel files
Active	Data files actively referenced by the current table version
Vacuumable	Data that can be removed by running `VACUUM` or enabling predictive optimization
Time Travel	Historical data used for rollbacks and time-travel queries (tombstoned/failsafe bytes)

From the Web UI

Navigate to Analysis > Storage Metrics.

Select a catalog from the dropdown
Optionally filter by schema and/or table
Click Analyze Storage

The page displays:

4 summary cards — Total, Active, Vacuumable, and Time Travel with percentages and file counts
Top Reclaimable Tables — The 10 tables with the most vacuumable storage (these are candidates for VACUUM)
Detail table — Per-table breakdown with all metrics and conditional coloring:
- Green: < 10% vacuumable
- Yellow: 10–30% vacuumable
- Red: > 30% vacuumable

From the CLI

Full catalog analysis

clxs storage-metrics --source my_catalog

Output:

======================================================================
STORAGE METRICS: my_catalog
======================================================================
  Schemas:          5
  Tables:           42
  Total storage:    5.00 GB (1250 files)
  Active data:      4.00 GB (80.0%)
  Vacuumable:       768.0 MB (15.0%)
  Time travel:      256.0 MB (5.0%)

  Per-schema breakdown:
    sales                            12 tables     2.50 GB  vacuum:    400.0 MB (16.0%)
    marketing                         8 tables     1.20 GB  vacuum:    200.0 MB (16.7%)
    ...

  Top 10 tables by reclaimable storage:
    sales.transactions                              300.0 MB (20.0% of 1.50 GB)
    marketing.events                                150.0 MB (25.0% of 600.0 MB)
    ...
======================================================================

Filter to a single schema

clxs storage-metrics --source my_catalog --schema sales

Filter to a single table

clxs storage-metrics --source my_catalog --schema sales --table transactions

CLI Options

Flag	Description	Default
`--source`	Source catalog name	From config
`--schema`	Filter to a specific schema	All schemas
`--table`	Filter to a specific table (requires `--schema`)	All tables

API Endpoint

POST /api/storage-metrics

Request Body

{
  "source_catalog": "my_catalog",
  "schema_filter": "sales",
  "table_filter": "transactions"
}

Both schema_filter and table_filter are optional. Omit them to analyze the full catalog.

Response

{
  "catalog": "my_catalog",
  "num_schemas": 5,
  "num_tables": 42,
  "total_bytes": 5368709120,
  "total_display": "5.00 GB",
  "num_total_files": 1250,
  "active_bytes": 4294967296,
  "active_display": "4.00 GB",
  "active_pct": 80.0,
  "vacuumable_bytes": 805306368,
  "vacuumable_display": "768.0 MB",
  "vacuumable_pct": 15.0,
  "time_travel_bytes": 268435456,
  "time_travel_display": "256.0 MB",
  "time_travel_pct": 5.0,
  "tables": [...],
  "schema_summaries": [...],
  "top_tables_by_vacuumable": [...],
  "top_tables_by_total": [...]
}

Performance Notes

ANALYZE TABLE ... COMPUTE STORAGE METRICS uses a recursive file listing. Execution time is typically within minutes per table but can take longer for very large tables.
Clone-Xs runs the command in parallel across tables (controlled by max_parallel_queries in config).
A progress bar is shown in the CLI during analysis.

Table Type Considerations

Materialized views and streaming tables: total_bytes includes metadata; active_bytes excludes vacuumable/time-travel portions.
Shallow clones: total_bytes includes only the clone's metadata and delta log. active_bytes is zero because the clone references the source table's data files.

OPTIMIZE and VACUUM

After analyzing storage, you can run OPTIMIZE and VACUUM directly on selected tables.

From the Web UI

Run Analyze Storage to see per-table metrics
Select tables using checkboxes (or "Select All")
Click OPTIMIZE or VACUUM in the action bar
For VACUUM, set retention hours (default: 168 = 7 days)
Confirm execution — results show success/failure per table

Predictive Optimization

If Predictive Optimization is enabled, a blue info banner warns that OPTIMIZE and VACUUM may run automatically. You can skip manual maintenance in this case.

From the CLI

# OPTIMIZE
clxs optimize --source my_catalog                       # All tables
clxs optimize --source my_catalog --schema sales        # One schema
clxs optimize --source my_catalog --schema sales --table orders  # One table
clxs optimize --source my_catalog --dry-run             # Preview only

# VACUUM
clxs vacuum --source my_catalog                         # All tables, 7-day retention
clxs vacuum --source my_catalog --retention-hours 48    # Custom retention
clxs vacuum --source my_catalog --schema sales          # One schema
clxs vacuum --source my_catalog --dry-run               # Preview only

API Endpoints

POST /api/optimize
POST /api/vacuum
POST /api/check-predictive-optimization

Request body for OPTIMIZE/VACUUM:

{
  "source_catalog": "my_catalog",
  "tables": [{"schema": "sales", "table": "orders"}],
  "retention_hours": 168,
  "dry_run": false
}

Omit tables to run on all tables. The retention_hours field is only used by VACUUM.

Common Use Cases

Analyze then act — Run storage metrics, identify reclaimable tables, then OPTIMIZE/VACUUM directly
Storage cost optimization — Find tables where time-travel or old versions are consuming significant storage
Post-clone audit — Verify storage footprint of cloned catalogs
Periodic reporting — Run on a schedule (via Create Job) to track storage trends over time
Skip if PO enabled — Check Predictive Optimization status before running manual maintenance

What It Measures​

From the Web UI​

From the CLI​

Full catalog analysis​

Filter to a single schema​

Filter to a single table​

CLI Options​

API Endpoint​

Request Body​

Response​

Performance Notes​

Table Type Considerations​

OPTIMIZE and VACUUM​

From the Web UI​

Predictive Optimization​

From the CLI​

API Endpoints​

Common Use Cases​

What It Measures

From the Web UI

From the CLI

Full catalog analysis

Filter to a single schema

Filter to a single table

CLI Options

API Endpoint

Request Body

Response

Performance Notes

Table Type Considerations

OPTIMIZE and VACUUM

From the Web UI

Predictive Optimization

From the CLI

API Endpoints

Common Use Cases