Skip to main content
Version 3.0.0

Integrations — Iceberg

Integration overview

The tdp-iceberg chart requires access to S3-compatible storage (Apache Ozone S3 Gateway, MinIO, or another S3 endpoint) and to Hive Metastore as the table catalog.

This page focuses on what the environment must have ready for maintenance jobs to work. Enabling jobs, schedules, and commands is covered in Iceberg configuration.

S3 / MinIO

Maintenance CronJobs require a Secret with S3 access credentials.

Create the Secret

Terminal input
kubectl -n <namespace> create secret generic s3-credentials \
--from-literal=access-key='<ACCESS_KEY>' \
--from-literal=secret-key='<SECRET_KEY>'
Credentials

Store credentials in a separate values file (outside Git) or in an existing Kubernetes Secret. Never commit them directly in the repository.

Hive Metastore

By default, the Iceberg catalog uses a Hive Metastore. Configure the connection URI via maintenance.spark.config:

maintenance:
spark:
config:
"spark.sql.catalog.iceberg.type": "hive"
"spark.sql.catalog.iceberg.uri": "thrift://<metastore-service>.<namespace>.svc.cluster.local:9083"
tip

A typical value is thrift://metastore.hive-metastore.svc.cluster.local:9083; adjust host and namespace for your environment.

Spark connection configuration

Full example with Iceberg catalog and S3 endpoint:

maintenance:
spark:
config:
"spark.sql.catalog.iceberg": "org.apache.iceberg.spark.SparkCatalog"
"spark.sql.catalog.iceberg.type": "hive"
"spark.sql.catalog.iceberg.uri": "thrift://<metastore-service>.<namespace>.svc.cluster.local:9083"
"spark.hadoop.fs.s3a.endpoint": "http://<s3-host>.<namespace>.svc.cluster.local:9000"
"spark.hadoop.fs.s3a.path.style.access": "true"

Connection parameters

ParameterDescriptionExample
spark.sql.catalog.icebergIceberg catalog class for Sparkorg.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.iceberg.typeCatalog typehive
spark.sql.catalog.iceberg.uriHive Metastore URIthrift://metastore.<ns>.svc.cluster.local:9083
spark.hadoop.fs.s3a.endpointS3 endpoint URLhttp://ozone-s3g.<ns>.svc.cluster.local:9000
spark.hadoop.fs.s3a.path.style.accessForce path-style (required for MinIO/Ozone)"true"

Trino

Trino can query Iceberg tables via the Iceberg connector. Configuration is done in the tdp-trino chart — see Trino configuration.

Spark

Iceberg tables can be processed directly by Spark. Configuration is done in the tdp-spark chart — see Spark configuration.

JupyterLab

JupyterLab can query Iceberg tables through Spark, provided that:

  • Jupyter ↔ Spark integration is working;
  • the Iceberg catalog is configured in Spark;
  • Hive Metastore and the S3/MinIO endpoint are reachable.

See:

Airflow

Airflow can orchestrate pipelines that operate on Iceberg tables. Connection configuration is done in the tdp-airflow chart — see Integrations — Airflow.

Combining values files

Terminal input
helm upgrade --install <release> \
oci://registry.tecnisys.com.br/tdp/charts/tdp-iceberg \
-n <namespace> \
-f my-values.yaml \
-f values-integration.yaml