Version 3.0

Integrations — Iceberg

ChartVersion3.0.1TypeapplicationAppVersion1.10.0

CompatibilityKubernetes1.32+OpenShift4.19+Rancher2.10.x+

Integrations overview

The tdp-iceberg chart requires access to S3-compatible storage (Apache Ozone S3 Gateway, MinIO, or other S3 endpoint) and Hive Metastore as a table catalog.

On this page, the focus is to show what the environment needs to be ready for maintenance jobs to work. The activation of jobs, frequencies and commands is in Iceberg Configuration.

S3 / MinIO

Maintenance CronJobs require the s3-credentials Secret with S3 access credentials. Create it before deployment as described in Security — Iceberg.

Hive Metastore

By default, the Iceberg catalog uses a Hive Metastore. Configure the connection URI via maintenance.spark.config:

maintenance:
  spark:
    config:
      "spark.sql.catalog.iceberg.type": "hive"
      "spark.sql.catalog.iceberg.uri": "thrift://<HIVE_METASTORE_SERVICE>.<NAMESPACE>.svc.cluster.local:9083"

tip

Typical value is thrift://metastore.hive-metastore.svc.cluster.local:9083; adjust host and namespace to your environment.

Configuring Spark connections

The complete example with Iceberg catalog + S3 endpoint:

maintenance:
  spark:
    config:
      "spark.sql.catalog.iceberg": "org.apache.iceberg.spark.SparkCatalog"
      "spark.sql.catalog.iceberg.type": "hive"
      "spark.sql.catalog.iceberg.uri": "thrift://<HIVE_METASTORE_SERVICE>.<NAMESPACE>.svc.cluster.local:9083"
      "spark.hadoop.fs.s3a.endpoint": "http://<S3_ENDPOINT>.<NAMESPACE>.svc.cluster.local:9000"
      "spark.hadoop.fs.s3a.path.style.access": "true"

Connection parameters

Parameter	Description	Example
`spark.sql.catalog.iceberg`	Iceberg catalog class for Spark	`org.apache.iceberg.spark.SparkCatalog`
`spark.sql.catalog.iceberg.type`	Catalog type	`hive`
`spark.sql.catalog.iceberg.uri`	Hive Metastore URI	`thrift://metastore.<NAMESPACE>.svc.cluster.local:9083`
`spark.hadoop.fs.s3a.endpoint`	S3 Endpoint URL	`http://ozone-s3g.<NAMESPACE>.svc.cluster.local:9000`
`spark.hadoop.fs.s3a.path.style.access`	Force path-style (required for MinIO/Ozone)	`"true"`

Trino

Trino can query Iceberg tables via the Iceberg connector. The configuration is carried out on the tdp-trino chart side — see Trino Configuration.

Spark

Iceberg tables can be processed directly by Spark. Configuration is performed on the tdp-spark chart side — see Spark Configuration.

JupyterLab

JupyterLab can query Iceberg tables via Spark as long as:

Jupyter ↔ Spark integration is functional;
the Iceberg catalog is configured in Spark;
Hive Metastore and S3/MinIO endpoint are accessible.

See:

Airflow

Airflow can orchestrate pipelines that operate Iceberg tables. The connection configuration is carried out on the tdp-airflow chart side — see Integrations — Airflow.

Combining value files

Terminal input
helm upgrade --install <RELEASE_NAME> \
  oci://registry.tecnisys.com.br/tdp/charts/tdp-iceberg \
  -n <NAMESPACE> \
  -f my-values.yaml \
  -f values-integration.yaml

Integrations overview​

S3 / MinIO​

Hive Metastore​

Configuring Spark connections​

Connection parameters​

Trino​

Spark​

JupyterLab​

Airflow​

Combining value files​