Integrations — Spark
Integration overview
The tdp-spark chart provides optional Delta Lake and Iceberg blocks, as well as Jupyter and Airflow integration via integration.jupyter and integration.airflow. S3/S3A configuration and metastore setup depend on your environment.
Delta Lake and Iceberg
Enable the blocks in values.yaml when applicable and complete spark.sparkConf / customSparkConfig.properties according to the library documentation and your image.
Typical installation (adjust -f to your values files):
helm upgrade --install <release> \
oci://registry.tecnisys.com.br/tdp/charts/tdp-spark \
-n <namespace> \
-f values.yaml
Iceberg + Hive Metastore + S3-compatible storage (example)
Replace the metastore, endpoint, and credentials with the client's actual values:
spark:
sparkConf:
"spark.sql.catalog.iceberg": "org.apache.iceberg.spark.SparkCatalog"
"spark.sql.catalog.iceberg.type": "hive"
"spark.sql.catalog.iceberg.uri": "thrift://<hive-metastore-service>.<namespace>.svc.cluster.local:9083"
"spark.sql.catalog.iceberg.warehouse": "s3a://<bucket>/hive"
"spark.sql.catalog.iceberg.io-impl": "org.apache.iceberg.aws.s3.S3FileIO"
"spark.hadoop.fs.s3a.endpoint": "https://<s3-endpoint>"
"spark.hadoop.fs.s3a.access.key": "<access-key>"
"spark.hadoop.fs.s3a.secret.key": "<secret-key>"
"spark.hadoop.fs.s3a.path.style.access": "true"
"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
"spark.sql.defaultCatalog": "iceberg"
Integration with Jupyter
In the tdp-spark chart, integration is done via integration.jupyter.sparkConfig (a ConfigMap rendered by the chart). Point spark.master and other keys to the Spark service in your deployment:
integration:
jupyter:
enabled: true
sparkConfig:
"spark.master": "spark://<spark-master-service>.<namespace>.svc.cluster.local:7077"
# Optional: network/port adjustments for the notebook/driver in the cluster
"spark.driver.bindAddress": "0.0.0.0"
If Jupyter is deployed by another chart (e.g. tdp-jupyter), refer to that chart's documentation for cross-namespace considerations, network policies, and URLs — the tdp-spark contract is to expose the integration.jupyter block as Spark client defaults.
Integration with Airflow
integration:
airflow:
enabled: true
sparkConfig:
"spark.master": "spark://<spark-master-service>.<namespace>.svc.cluster.local:7077"
"spark.driver.memory": "1g"
"spark.executor.memory": "2g"
"spark.executor.cores": "1"
See Airflow Configuration to align operators and dependencies with this sparkConfig.