Version Next

Integrations — Spark

Integration overview

The tdp-spark chart provides optional Delta Lake and Iceberg blocks, as well as Jupyter and Airflow integration via integration.jupyter and integration.airflow. S3/S3A configuration and metastore setup depend on your environment.

Delta Lake and Iceberg

Enable the blocks in values.yaml when applicable and complete spark.sparkConf / customSparkConfig.properties according to the library documentation and your image.

Typical installation (adjust -f to your values files):

Example
helm upgrade --install <release> \
  oci://registry.tecnisys.com.br/tdp/charts/tdp-spark \
  -n <namespace> \
  -f values.yaml

Iceberg + Hive Metastore + S3-compatible storage (example)

Replace the metastore, endpoint, and credentials with the client's actual values:

spark:
  sparkConf:
    "spark.sql.catalog.iceberg": "org.apache.iceberg.spark.SparkCatalog"
    "spark.sql.catalog.iceberg.type": "hive"
    "spark.sql.catalog.iceberg.uri": "thrift://<hive-metastore-service>.<namespace>.svc.cluster.local:9083"
    "spark.sql.catalog.iceberg.warehouse": "s3a://<bucket>/hive"
    "spark.sql.catalog.iceberg.io-impl": "org.apache.iceberg.aws.s3.S3FileIO"
    "spark.hadoop.fs.s3a.endpoint": "https://<s3-endpoint>"
    "spark.hadoop.fs.s3a.access.key": "<access-key>"
    "spark.hadoop.fs.s3a.secret.key": "<secret-key>"
    "spark.hadoop.fs.s3a.path.style.access": "true"
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
    "spark.sql.defaultCatalog": "iceberg"

Integration with Jupyter

In the tdp-spark chart, integration is done via integration.jupyter.sparkConfig (a ConfigMap rendered by the chart). Point spark.master and other keys to the Spark service in your deployment:

integration:
  jupyter:
    enabled: true
    sparkConfig:
      "spark.master": "spark://<spark-master-service>.<namespace>.svc.cluster.local:7077"
      # Optional: network/port adjustments for the notebook/driver in the cluster
      "spark.driver.bindAddress": "0.0.0.0"

If Jupyter is deployed by another chart (e.g. tdp-jupyter), refer to that chart's documentation for cross-namespace considerations, network policies, and URLs — the tdp-spark contract is to expose the integration.jupyter block as Spark client defaults.

Integration with Airflow

integration:
  airflow:
    enabled: true
    sparkConfig:
      "spark.master": "spark://<spark-master-service>.<namespace>.svc.cluster.local:7077"
      "spark.driver.memory": "1g"
      "spark.executor.memory": "2g"
      "spark.executor.cores": "1"

See Airflow Configuration to align operators and dependencies with this sparkConfig.

Integration overview​

Delta Lake and Iceberg​

Iceberg + Hive Metastore + S3-compatible storage (example)​

Integration with Jupyter​

Integration with Airflow​

Integration overview

Delta Lake and Iceberg

Iceberg + Hive Metastore + S3-compatible storage (example)

Integration with Jupyter

Integration with Airflow