Skip to content

Data Model

All Snowpack state lives in a single Postgres database. This page documents the hot-path queue/lock/cache tables plus the durable history tables created by HistoryStore.ensure_schema().

jobs

The primary record for each maintenance request. One row per job.

ColumnTypeConstraintsDescription
job_idTEXTPKUnique job identifier
database_nameTEXTNOT NULLIceberg database name
table_nameTEXTNOT NULLIceberg table name
actionsTEXT[]NOT NULLOrdered list of maintenance actions
dry_runBOOLNOT NULLWhether to simulate without writing
statusTEXTNOT NULLOne of: pending, running, completed, failed, cancelled
submitted_atTIMESTAMPTZNOT NULLWhen the job was submitted
started_atTIMESTAMPTZWhen the worker began execution
completed_atTIMESTAMPTZWhen the job reached terminal state
errorTEXTError message if failed
created_atTIMESTAMPTZDEFAULT NOW()Row creation timestamp
attempt_idTEXTDL-197 fence — current claim’s unique ID
deleted_atTIMESTAMPTZSoft-delete tombstone

job_queue

Transient work queue. Rows exist only while a job is in-flight (pending or running). Deleted on ack.

ColumnTypeConstraintsDescription
job_idTEXTPK, FK -> jobsReferences the parent job
enqueued_atTIMESTAMPTZDEFAULT NOW()When the job entered the queue
visible_atTIMESTAMPTZNOT NULLClaim lease deadline — job becomes re-claimable after this time
claimed_atTIMESTAMPTZWhen a worker claimed the job
claimed_byTEXTWorker hostname
attempt_idTEXTFence echo — must match jobs.attempt_id
attempt_countINTDEFAULT 0Number of claim attempts

job_actions

Per-action results within a job. One row per action execution.

ColumnTypeConstraintsDescription
idBIGINTPK, identityUnique action-result identifier
job_idTEXTFK -> jobsParent job
actionTEXTNOT NULLAction name (e.g., rewrite_data_files, expire_snapshots)
successBOOLNOT NULLWhether the action succeeded
messageTEXTNOT NULL, DEFAULT ''Human-readable result summary
errorTEXTError message if the action failed
elapsed_secondsDOUBLE PRECISIONNOT NULL, DEFAULT 0.0Wall-clock duration of the action
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()Row creation timestamp

table_locks

Per-table ownership lock. Prevents concurrent maintenance on the same table.

ColumnTypeConstraintsDescription
table_keyTEXTPK"{database}.{table}" composite key
holderTEXTNOT NULLjob_id of the lock owner
acquired_atTIMESTAMPTZDEFAULT NOW()When the lock was acquired
expires_atTIMESTAMPTZNOT NULLLock expiration deadline

table_cache_entries

Cached table inventory from the catalog. Refreshed atomically by TableCacheSyncWorker.

ColumnTypeConstraintsDescription
database_nameTEXTIceberg database name
table_nameTEXTIceberg table name
maintenance_enabledBOOLnullableWhether maintenance is enabled for this table
payloadJSONBFull table metadata payload

Primary key: (database_name, table_name)

table_cache_meta

Single-row metadata about the table cache sync state.

ColumnTypeConstraintsDescription
idINTPK, CHECK (id = 1)Single-row invariant
last_synced_atTIMESTAMPTZLast successful sync timestamp

health_snapshots

Health reports persisted by live GET /tables/{database}/{table}/health requests and by the health-sync worker/CronJob. Cached health reads use the newest row per table.

ColumnTypeConstraintsDescription
idBIGINTPK, identityUnique snapshot identifier
database_nameTEXTNOT NULLIceberg database name
table_nameTEXTNOT NULLIceberg table name
total_data_filesINTEGERNOT NULL, DEFAULT 0Current data-file count
total_delete_filesINTEGERNOT NULL, DEFAULT 0Current delete-file count
total_size_bytesBIGINTNOT NULL, DEFAULT 0Total data size
avg_file_size_bytesDOUBLE PRECISIONNOT NULL, DEFAULT 0.0Average data-file size
small_file_countINTEGERNOT NULL, DEFAULT 0Number of small files
small_file_pctDOUBLE PRECISIONNOT NULL, DEFAULT 0.0Percent of data files considered small
snapshot_countINTEGERNOT NULL, DEFAULT 0Snapshot count
manifest_countINTEGERNOT NULL, DEFAULT 0Manifest count
needs_maintenanceBOOLEANNOT NULLWhether any threshold recommends work
maintenance_enabledBOOLEANnullableTable opt-in flag; NULL means unset/unrecognized
maintenance_cadence_hoursINTEGERPer-table cadence override
recommended_actionsTEXT[]NOT NULL, DEFAULT '{}'Ordered maintenance actions
errorTEXTNon-fatal analyzer error
total_recordsBIGINTDEFAULT 0Current record count
snapshot_idBIGINTDEFAULT 0Current snapshot ID
health_statusTEXTDEFAULT 'Unknown'Summary health label
hours_since_last_snapshotDOUBLE PRECISIONDEFAULT 0Age of the current snapshot
oldest_snapshot_age_hoursDOUBLE PRECISIONDEFAULT 0Age of the oldest retained snapshot
checked_atTIMESTAMPTZNOT NULL, DEFAULT NOW()Collection timestamp

orchestrator_runs

Audit log of orchestrator CronJob executions.

ColumnTypeConstraintsDescription
idBIGINTPK, identityUnique run identifier
started_atTIMESTAMPTZNOT NULLWhen the run started
completed_atTIMESTAMPTZNOT NULLWhen the run completed
statusTEXTNOT NULLRun outcome
tables_assessedINTEGERNOT NULL, DEFAULT 0Tables evaluated
tables_skippedINTEGERNOT NULL, DEFAULT 0Tables skipped
tables_healthyINTEGERNOT NULL, DEFAULT 0Tables already healthy
jobs_submittedINTEGERNOT NULL, DEFAULT 0Jobs submitted by the run
jobs_completedINTEGERNOT NULL, DEFAULT 0Submitted jobs completed
jobs_failedINTEGERNOT NULL, DEFAULT 0Submitted jobs failed
dry_runBOOLEANNOT NULL, DEFAULT FALSEWhether the run was advisory only
errorTEXTRun-level error
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()Row creation timestamp

Key indexes

IndexDefinitionPurpose
idx_jobs_table(database_name, table_name, submitted_at DESC)Table-scoped job history lookups
idx_jobs_status(status, submitted_at DESC)Status-filtered job lists
idx_jobs_submitted(submitted_at DESC)Recent-job ordering
idx_jobs_visiblePartial index WHERE deleted_at IS NULLSupports the live-read joined view — excludes soft-deleted rows so the API’s job listing query only scans active jobs
idx_job_actions_job_id(job_id)Fetches per-job action results
idx_job_actions_action(action, created_at DESC)Action-level history queries
idx_job_queue_readyPartial index on (visible_at) WHERE claimed_at IS NULLServes KEDA’s scaler query and the worker’s dequeue_with_claim CTE
idx_job_queue_claimedPartial index on (visible_at) WHERE claimed_at IS NOT NULLSupports stale-claim sweeps
idx_table_locks_expires(expires_at)Finds expired table locks
idx_table_cache_maintenancePartial index on (maintenance_enabled) WHERE maintenance_enabled = TRUESpeeds opted-in table listing
idx_health_table(database_name, table_name, checked_at DESC)Fetches latest cached health per table
idx_health_checked_at(checked_at DESC)Retention and recent-health scans
idx_orchestrator_runs_started(started_at DESC)Recent orchestrator run listing