Skip to content

Onboarding a Database

Snowpack discovers Iceberg tables through a PyIceberg catalog and, in opt-out mode (the default), maintains every table in an allowlisted database unless a table opts out. Onboarding a new database is primarily a one-step process: register the database in the Helm chart so the orchestrator is allowed to automate it. Optionally opt individual tables out, and add the database to health sync so cached health snapshots are precomputed before orchestrator runs.

Step 1 — Add the database to Helm values

Open charts/snowpack/values-dev.yaml (or the values file for the target environment) and add the database name to the single databases value. This one list feeds both the orchestrator (automated maintenance) and the 15-minute health-sync CronJob (cached health snapshots). It is a comma-separated string:

databases: "offer_service,points_service,<new_database>"

Once the database is allowlisted and deployed, every table in it is maintained automatically — there is no per-table opt-in step.

Step 2 — (Optional) Opt out or tune individual tables

Tables in an allowlisted database are maintained by default. To exclude a specific table, opt it out via Spark SQL (or Kyuubi):

ALTER TABLE lakehouse_dev.<database>.<table>
SET TBLPROPERTIES ('snowpack.maintenance_enabled' = 'false');

Use compaction_skip = 'true' instead to hard-exclude a table from all maintenance regardless of mode (for example, during a migration).

Per-table cadence override. By default the orchestrator respects the cluster-wide cadenceHours value (6 hours in dev). To override the cadence for a specific table, set the snowpack.maintenance_cadence_hours property:

ALTER TABLE lakehouse_dev.<database>.<table>
SET TBLPROPERTIES ('snowpack.maintenance_cadence_hours' = '12');

Step 3 — Deploy via Terraform

All Snowpack infrastructure changes are deployed through Terraform. Never run helm install or helm upgrade directly — Terraform owns the Helm release and direct Helm commands cause state drift.

Terminal window
terraform apply

If you modified any files under charts/snowpack/templates/, remember to bump the version field in charts/snowpack/Chart.yaml as well. Terraform detects chart changes by comparing the chart version; template-only edits without a version bump are invisible to the plan. (Editing a values-*.yaml file does not require a version bump — Terraform reads the values file directly.)

Step 4 — Verify

After Terraform applies successfully, wait for the next CronJob firing. In the dev environment the orchestrator runs hourly at :30 past the hour.

Check recent orchestrator runs to confirm the new database’s tables were assessed:

Terminal window
curl -s https://<snowpack-host>/orchestrator/runs | jq '.[0]'

A successful run includes tables_assessed, jobs_submitted, and jobs_completed counts. If the new tables do not appear, verify that:

  1. The table appears in the API table cache (GET /tables).
  2. The database is listed in the databases value in the deployed values.
  3. The table has not opted out (snowpack.maintenance_enabled is not false and compaction_skip is not true).
  4. If you expect cached health, the health-sync CronJob has completed at least one cycle since the deploy (runs every 15 minutes). The same databases value controls health-sync scope.

You can also confirm a specific table is visible in the cache:

Terminal window
curl -s https://<snowpack-host>/tables?database=<new_database>

This returns the list of tables Snowpack knows about for that database. Each entry includes in_maintenance_allowlist: true/false indicating whether the table’s database is in the databases allowlist. If the list is empty, the API table-cache sync has not discovered the database yet or the catalog cannot list it.

In the web UI (/ui), databases outside the maintenance allowlist appear greyed out with their tables disabled. After a successful onboard, the database should no longer be greyed.