Onboarding a Database
Snowpack discovers Iceberg tables through a PyIceberg catalog and, in opt-out mode (the default), maintains every table in an allowlisted database unless a table opts out. Onboarding a new database is primarily a one-step process: register the database in the Helm chart so the orchestrator is allowed to automate it. Optionally opt individual tables out, and add the database to health sync so cached health snapshots are precomputed before orchestrator runs.
Step 1 — Add the database to Helm values
Open charts/snowpack/values-dev.yaml (or the values file for the target
environment) and add the database name to the single databases value. This one
list feeds both the orchestrator (automated maintenance) and the 15-minute
health-sync CronJob (cached health snapshots). It is a comma-separated string:
databases: "offer_service,points_service,<new_database>"Once the database is allowlisted and deployed, every table in it is maintained automatically — there is no per-table opt-in step.
Step 2 — (Optional) Opt out or tune individual tables
Tables in an allowlisted database are maintained by default. To exclude a specific table, opt it out via Spark SQL (or Kyuubi):
ALTER TABLE lakehouse_dev.<database>.<table> SET TBLPROPERTIES ('snowpack.maintenance_enabled' = 'false');Use compaction_skip = 'true' instead to hard-exclude a table from all maintenance
regardless of mode (for example, during a migration).
Per-table cadence override. By default the orchestrator respects the
cluster-wide cadenceHours value (6 hours in dev). To override the cadence for a
specific table, set the snowpack.maintenance_cadence_hours property:
ALTER TABLE lakehouse_dev.<database>.<table> SET TBLPROPERTIES ('snowpack.maintenance_cadence_hours' = '12');Step 3 — Deploy via Terraform
All Snowpack infrastructure changes are deployed through Terraform. Never run
helm install or helm upgrade directly — Terraform owns the Helm release
and direct Helm commands cause state drift.
terraform applyIf you modified any files under charts/snowpack/templates/, remember to bump
the version field in charts/snowpack/Chart.yaml as well. Terraform detects
chart changes by comparing the chart version; template-only edits without a
version bump are invisible to the plan. (Editing a values-*.yaml file does not
require a version bump — Terraform reads the values file directly.)
Step 4 — Verify
After Terraform applies successfully, wait for the next CronJob firing. In the
dev environment the orchestrator runs hourly at :30 past the hour.
Check recent orchestrator runs to confirm the new database’s tables were assessed:
curl -s https://<snowpack-host>/orchestrator/runs | jq '.[0]'A successful run includes tables_assessed, jobs_submitted, and
jobs_completed counts. If the new tables do not appear, verify that:
- The table appears in the API table cache (
GET /tables). - The database is listed in the
databasesvalue in the deployed values. - The table has not opted out (
snowpack.maintenance_enabledis notfalseandcompaction_skipis nottrue). - If you expect cached health, the health-sync CronJob has completed at least one
cycle since the deploy (runs every 15 minutes). The same
databasesvalue controls health-sync scope.
You can also confirm a specific table is visible in the cache:
curl -s https://<snowpack-host>/tables?database=<new_database>This returns the list of tables Snowpack knows about for that database. Each
entry includes in_maintenance_allowlist: true/false indicating whether the
table’s database is in the databases allowlist. If the list is empty,
the API table-cache sync has not discovered the database yet or the catalog
cannot list it.
In the web UI (/ui), databases outside the maintenance allowlist appear
greyed out with their tables disabled. After a successful onboard, the database
should no longer be greyed.