Skip to main content
Version 3.0.0

Integrations — Delta Lake

Integration overview

The tdp-deltalake chart requires access to S3-compatible storage (Apache Ozone S3 Gateway, MinIO, or another S3 endpoint) and can be integrated with the main processing and orchestration tools in TDP.

S3 / MinIO

Maintenance CronJobs require a Secret with S3 access credentials.

Create the Secret

Terminal input
kubectl -n <namespace> create secret generic s3-credentials \
--from-literal=access-key='<ACCESS_KEY>' \
--from-literal=secret-key='<SECRET_KEY>'
Credentials

Store credentials in a separate values file (outside Git) or in an existing Kubernetes Secret. Never commit them directly in the repository.

Configure the S3 endpoint

maintenance:
spark:
config:
"spark.hadoop.fs.s3a.endpoint": "http://<s3-host>.<namespace>.svc.cluster.local:9000"
"spark.hadoop.fs.s3a.path.style.access": "true"

S3 parameters

ParameterDescriptionExample
spark.hadoop.fs.s3a.endpointS3 endpoint URLhttp://ozone-s3g.<ns>.svc.cluster.local:9000
spark.hadoop.fs.s3a.path.style.accessForce path-style (required for MinIO/Ozone)"true"

Trino

Trino can query Delta Lake tables via the Delta connector. Configuration is done in the tdp-trino chart — see Trino configuration.

Spark

Delta Lake tables can be processed directly by Spark. Configuration is done in the tdp-spark chart — see Spark configuration.

Airflow

Airflow can orchestrate pipelines that read and write Delta Lake tables. Connection configuration is done in the tdp-airflow chart — see Integrations — Airflow.

Combining values files

Terminal input
helm upgrade --install <release> \
oci://registry.tecnisys.com.br/tdp/charts/tdp-deltalake \
-n <namespace> \
-f my-values.yaml \
-f values-integration.yaml