Onboarding a Database
Snowpack discovers Iceberg tables through a PyIceberg catalog and runs maintenance for tables that have explicitly opted in. Onboarding a new database is a two-step process: opt tables in at the catalog level, then register the database in the Helm chart so the orchestrator is allowed to automate it. Adding the database to health sync is recommended when you want cached health snapshots precomputed before orchestrator runs.
Step 1 — Opt tables in via Spark SQL
Each table must declare that it wants Snowpack maintenance by setting the
snowpack.maintenance_enabled table property. Connect to Spark (or Kyuubi) and
run:
ALTER TABLE lakehouse_dev.<database>.<table> SET TBLPROPERTIES ('snowpack.maintenance_enabled' = 'true');Replace <database> and <table> with the actual database and table names.
Repeat for every table in the database that should receive automated
maintenance.
Per-table cadence override. By default the orchestrator respects the
cluster-wide cadenceHours value (6 hours in dev). To override the cadence for
a specific table, set the snowpack.maintenance_cadence_hours property at the
same time:
ALTER TABLE lakehouse_dev.<database>.<table> SET TBLPROPERTIES ( 'snowpack.maintenance_enabled' = 'true', 'snowpack.maintenance_cadence_hours' = '12' );Tables without the maintenance_enabled property, or with it set to any value
other than true, are ignored by the orchestrator.
Step 2 — Add the database to Helm values
Open charts/snowpack/values-dev.yaml and add the database name to
orchestrator.includeDatabases. Add it to healthSync.databases as well when
you want the 15-minute health-sync CronJob to precompute cached health for that
database. These values are comma-separated strings:
healthSync: databases: "offer_service,points_service,<new_database>"
orchestrator: includeDatabases: "offer_service,points_service,<new_database>"Step 3 — Deploy via Terraform
All Snowpack infrastructure changes are deployed through Terraform. Never run
helm install or helm upgrade directly — Terraform owns the Helm release
and direct Helm commands cause state drift.
terraform applyIf you modified any files under charts/snowpack/templates/, remember to bump
the version field in charts/snowpack/Chart.yaml as well. Terraform detects
chart changes by comparing the chart version; template-only edits without a
version bump are invisible to the plan.
Step 4 — Verify
After Terraform applies successfully, wait for the next CronJob firing. In the
dev environment the orchestrator runs hourly at :30 past the hour.
Check recent orchestrator runs to confirm the new database’s tables were assessed:
curl -s https://<snowpack-host>/orchestrator/runs | jq '.[0]'A successful run includes tables_assessed, jobs_submitted, and
jobs_completed counts. If the new tables do not appear, verify that:
- The table property
snowpack.maintenance_enabledis set totruein the catalog. - The database is listed in
orchestrator.includeDatabasesin the deployed values. - If you expect cached health, the database is listed in
healthSync.databasesand the health-sync CronJob has completed at least one cycle since the deploy (runs every 15 minutes).
You can also confirm a specific table is visible in the cache:
curl -s https://<snowpack-host>/tables?database=<new_database>This returns the list of tables Snowpack knows about for that database. If the list is empty, the API table-cache sync has not discovered the database yet or the catalog cannot list it.