Skip to main content
Version 3.0.0

Security — Spark

The tdp-spark chart accesses S3/MinIO storage via the S3A protocol. Access credentials are configured as placeholders in values.yaml and must be supplied through private values files or your environment's Secrets management mechanisms.

S3/S3A credentials

S3/MinIO access credentials are injected via spark.sparkConf or hadoopConfig:

spark:
sparkConf:
"spark.hadoop.fs.s3a.access.key": "<ACCESS_KEY>"
"spark.hadoop.fs.s3a.secret.key": "<SECRET_KEY>"
"spark.hadoop.fs.s3a.endpoint": "http://<s3-endpoint>:<port>"
"spark.hadoop.fs.s3a.path.style.access": "true"
"spark.hadoop.fs.s3a.connection.ssl.enabled": "false"

Alternatively, the same properties can be defined in hadoopConfig (rendered as core-site.xml):

hadoopConfig:
"fs.s3a.access.key": "<ACCESS_KEY>"
"fs.s3a.secret.key": "<SECRET_KEY>"
"fs.s3a.endpoint": "http://<s3-endpoint>:<port>"
"fs.s3a.path.style.access": "true"

Or via customSparkConfig.properties (rendered as spark-defaults.conf):

customSparkConfig:
properties: |
spark.hadoop.fs.s3a.access.key=<ACCESS_KEY>
spark.hadoop.fs.s3a.secret.key=<SECRET_KEY>
spark.hadoop.fs.s3a.endpoint=http://<s3-endpoint>:<port>
spark.hadoop.fs.s3a.path.style.access=true

Integration with Ozone (TDP)

For Ozone installed with TDP, the internal S3 Gateway endpoint follows the pattern:

http://<release>-s3g-rest.<namespace>.svc.cluster.local:9878

The credentials must match the ozone-s3-credentials Secret configured in Ozone. See Security — Apache Ozone for details on the Secret.

Integration with Jupyter and Airflow

The chart renders specific configuration files for integration with other services:

  • integration.jupyter.sparkConfigspark-defaults.conf for the JupyterLab environment
  • integration.airflow.sparkConfig → default settings for job submission via Airflow

These configurations also inherit the S3A properties defined above when present.

Best practices

AspectRecommendation
CredentialsDo not version access.key and secret.key in a Git repository
Environment separationUse distinct values files for development and production
RotationUpdate the values and run helm upgrade to apply new credentials

Troubleshooting

ProblemProbable causeSolution
Spark jobs fail with S3 access errorMissing credentials or incorrect endpointCheck spark.hadoop.fs.s3a.* in the applied values
Connection refused to S3Incorrect S3 service addressVerify the Ozone service name and namespace in the cluster
AccessDeniedExceptionCredentials without permission on the bucketReview permissions in MinIO/Ozone for the configured user

For the full list of parameters, use helm show values on the version of the chart you installed.