Integrations — Iceberg
Integration overview
The tdp-iceberg chart requires access to S3-compatible storage (Apache Ozone S3 Gateway, MinIO, or another S3 endpoint) and to Hive Metastore as the table catalog.
This page focuses on what the environment must have ready for maintenance jobs to work. Enabling jobs, schedules, and commands is covered in Iceberg configuration.
S3 / MinIO
Maintenance CronJobs require a Secret with S3 access credentials.
Create the Secret
kubectl -n <namespace> create secret generic s3-credentials \
--from-literal=access-key='<ACCESS_KEY>' \
--from-literal=secret-key='<SECRET_KEY>'
Store credentials in a separate values file (outside Git) or in an existing Kubernetes Secret. Never commit them directly in the repository.
Hive Metastore
By default, the Iceberg catalog uses a Hive Metastore. Configure the connection URI via maintenance.spark.config:
maintenance:
spark:
config:
"spark.sql.catalog.iceberg.type": "hive"
"spark.sql.catalog.iceberg.uri": "thrift://<metastore-service>.<namespace>.svc.cluster.local:9083"
A typical value is thrift://metastore.hive-metastore.svc.cluster.local:9083; adjust host and namespace for your environment.
Spark connection configuration
Full example with Iceberg catalog and S3 endpoint:
maintenance:
spark:
config:
"spark.sql.catalog.iceberg": "org.apache.iceberg.spark.SparkCatalog"
"spark.sql.catalog.iceberg.type": "hive"
"spark.sql.catalog.iceberg.uri": "thrift://<metastore-service>.<namespace>.svc.cluster.local:9083"
"spark.hadoop.fs.s3a.endpoint": "http://<s3-host>.<namespace>.svc.cluster.local:9000"
"spark.hadoop.fs.s3a.path.style.access": "true"
Connection parameters
| Parameter | Description | Example |
|---|---|---|
spark.sql.catalog.iceberg | Iceberg catalog class for Spark | org.apache.iceberg.spark.SparkCatalog |
spark.sql.catalog.iceberg.type | Catalog type | hive |
spark.sql.catalog.iceberg.uri | Hive Metastore URI | thrift://metastore.<ns>.svc.cluster.local:9083 |
spark.hadoop.fs.s3a.endpoint | S3 endpoint URL | http://ozone-s3g.<ns>.svc.cluster.local:9000 |
spark.hadoop.fs.s3a.path.style.access | Force path-style (required for MinIO/Ozone) | "true" |
Trino
Trino can query Iceberg tables via the Iceberg connector. Configuration is done in the tdp-trino chart — see Trino configuration.
Spark
Iceberg tables can be processed directly by Spark. Configuration is done in the tdp-spark chart — see Spark configuration.
JupyterLab
JupyterLab can query Iceberg tables through Spark, provided that:
- Jupyter ↔ Spark integration is working;
- the Iceberg catalog is configured in Spark;
- Hive Metastore and the S3/MinIO endpoint are reachable.
See:
Airflow
Airflow can orchestrate pipelines that operate on Iceberg tables. Connection configuration is done in the tdp-airflow chart — see Integrations — Airflow.
Combining values files
helm upgrade --install <release> \
oci://registry.tecnisys.com.br/tdp/charts/tdp-iceberg \
-n <namespace> \
-f my-values.yaml \
-f values-integration.yaml