Skip to content

Onboarding a Database

Snowpack discovers Iceberg tables through a PyIceberg catalog and runs maintenance for tables that have explicitly opted in. Onboarding a new database is a two-step process: opt tables in at the catalog level, then register the database in the Helm chart so the orchestrator is allowed to automate it. Adding the database to health sync is recommended when you want cached health snapshots precomputed before orchestrator runs.

Step 1 — Opt tables in via Spark SQL

Each table must declare that it wants Snowpack maintenance by setting the snowpack.maintenance_enabled table property. Connect to Spark (or Kyuubi) and run:

ALTER TABLE lakehouse_dev.<database>.<table>
SET TBLPROPERTIES ('snowpack.maintenance_enabled' = 'true');

Replace <database> and <table> with the actual database and table names. Repeat for every table in the database that should receive automated maintenance.

Per-table cadence override. By default the orchestrator respects the cluster-wide cadenceHours value (6 hours in dev). To override the cadence for a specific table, set the snowpack.maintenance_cadence_hours property at the same time:

ALTER TABLE lakehouse_dev.<database>.<table>
SET TBLPROPERTIES (
'snowpack.maintenance_enabled' = 'true',
'snowpack.maintenance_cadence_hours' = '12'
);

Tables without the maintenance_enabled property, or with it set to any value other than true, are ignored by the orchestrator.

Step 2 — Add the database to Helm values

Open charts/snowpack/values-dev.yaml and add the database name to orchestrator.includeDatabases. Add it to healthSync.databases as well when you want the 15-minute health-sync CronJob to precompute cached health for that database. These values are comma-separated strings:

healthSync:
databases: "offer_service,points_service,<new_database>"
orchestrator:
includeDatabases: "offer_service,points_service,<new_database>"

Step 3 — Deploy via Terraform

All Snowpack infrastructure changes are deployed through Terraform. Never run helm install or helm upgrade directly — Terraform owns the Helm release and direct Helm commands cause state drift.

Terminal window
terraform apply

If you modified any files under charts/snowpack/templates/, remember to bump the version field in charts/snowpack/Chart.yaml as well. Terraform detects chart changes by comparing the chart version; template-only edits without a version bump are invisible to the plan.

Step 4 — Verify

After Terraform applies successfully, wait for the next CronJob firing. In the dev environment the orchestrator runs hourly at :30 past the hour.

Check recent orchestrator runs to confirm the new database’s tables were assessed:

Terminal window
curl -s https://<snowpack-host>/orchestrator/runs | jq '.[0]'

A successful run includes tables_assessed, jobs_submitted, and jobs_completed counts. If the new tables do not appear, verify that:

  1. The table property snowpack.maintenance_enabled is set to true in the catalog.
  2. The database is listed in orchestrator.includeDatabases in the deployed values.
  3. If you expect cached health, the database is listed in healthSync.databases and the health-sync CronJob has completed at least one cycle since the deploy (runs every 15 minutes).

You can also confirm a specific table is visible in the cache:

Terminal window
curl -s https://<snowpack-host>/tables?database=<new_database>

This returns the list of tables Snowpack knows about for that database. If the list is empty, the API table-cache sync has not discovered the database yet or the catalog cannot list it.