Apache Ozone Configuration
The tdp-ozone chart deploys Apache Ozone — S3-compatible object storage — on Kubernetes for TDP.
What is Apache Ozone?
Apache Ozone is a distributed object storage system compatible with the Amazon S3 API.
In TDP Kubernetes, it plays the role of S3: it is where data is stored — Parquet files, Delta Lake tables, Iceberg tables, logs, models.
The main advantage of Ozone over using Amazon S3 directly is that it runs inside the Kubernetes cluster, eliminating dependency on external services and reducing latency and data transfer costs.
See Apache Ozone — Concepts for a complete overview of the tool, its architecture and how it works.
How Ozone fits into TDP
Ozone is used as storage by Spark (via the S3A protocol), by Trino (via the native S3 connector), and optionally by ClickHouse.
From the applications' perspective, Ozone is transparent: they point to s3a://bucket/path and the S3 protocol handles the rest.
Ozone architecture
Ozone is composed of four services that must be available for the cluster to function:
| Service | Role | Port |
|---|---|---|
| S3 Gateway (REST) | S3-compatible endpoint for reading and writing data | 9878 |
| S3 Gateway (Web UI) | S3 Gateway monitoring web interface | 19878 |
| Ozone Manager (OM) | Manages namespaces, volumes, buckets, and metadata | 9874 |
| Storage Container Manager (SCM) | Manages where data blocks reside on datanodes | 9876 |
| Recon | Administrative and cluster monitoring interface | 9888 |
Datanodes are the nodes that actually store the data. The number of datanodes (ozone.datanode.replicas) determines the storage capacity and supported replication factor.
Installation (OCI)
helm upgrade --install <release> \
oci://registry.tecnisys.com.br/tdp/charts/tdp-ozone \
-n <namespace> --create-namespace \
-f meu-values.yaml
Components
Apache Ozone is composed of four services, each with its own Service and optional Ingress:
| Service | Description | Default port |
|---|---|---|
S3 Gateway REST (s3g-rest) | S3-compatible endpoint for data operations | 9878 |
S3 Gateway Web UI (s3g-web) | S3 Gateway web interface | 19878 |
Ozone Manager (om) | Namespace and metadata manager | 9874 |
SCM (scm) | Storage Container Manager | 9876 |
Recon (recon) | Monitoring and administration interface | 9888 |
S3 Authentication
The S3 Gateway uses AWS Signature v4 authentication (simple mode, without Kerberos).
Create credentials Secret
kubectl -n <namespace> create secret generic ozone-s3-credentials \
--from-literal=aws_access_key_id="<AWS_ACCESS_KEY_ID>" \
--from-literal=aws_secret_access_key="<AWS_SECRET_ACCESS_KEY>"
Reference the Secret in the Helm values file, for example:
ozone:
s3g:
auth:
enabled: true
secretName: "ozone-s3-credentials"
Configure AWS CLI for access
aws configure set aws_access_key_id <AWS_ACCESS_KEY_ID>
aws configure set aws_secret_access_key <AWS_SECRET_ACCESS_KEY>
aws configure set region us-east-1
# Test (use the Ingress host or the endpoint you exposed)
aws s3 ls --endpoint-url=http://<s3-rest-host>
Persistence
Enable persistence for the components that require storage:
ozone:
datanode:
persistence:
enabled: true
storageClassName: "<storage-class>"
size: 100Gi
om:
persistence:
enabled: true
storageClassName: "<storage-class>"
size: 10Gi
scm:
persistence:
enabled: true
storageClassName: "<storage-class>"
size: 10Gi
Main parameters
| Parameter | Description | Default |
|---|---|---|
ozone.enabled | Enable Ozone | true |
ozone.image.tag | Ozone image version | 2.0.0 |
ozone.datanode.replicas | Number of datanodes | 3 |
ozone.om.replicas | Ozone Manager replicas | 1 |
ozone.scm.replicas | SCM replicas | 1 |
ozone.recon.replicas | Recon replicas | 1 |
ozone.s3g.replicas | S3 Gateway replicas | 1 |
ozone.s3g.auth.enabled | Enable S3 authentication | true |
ozone.s3g.auth.secretName | Name of the Secret with S3 credentials | ozone-s3-credentials |
ingress.s3g.rest.enabled | Enable S3 Gateway REST Ingress | true |
ingress.s3g.web.enabled | Enable S3 Gateway Web UI Ingress | true |
ingress.om.enabled | Enable Ozone Manager Ingress | true |
ingress.scm.enabled | Enable SCM Ingress | true |
See the Ingress — Apache Ozone page to configure external exposure for each service.
Apache Ozone is used as S3-compatible storage by other TDP components. See Integrations — Apache Ozone to configure Spark and Trino with Ozone.
Access via port-forward
# S3 Gateway REST
kubectl -n <namespace> port-forward svc/<release>-s3g-rest 9878:9878
# Ozone Manager UI
kubectl -n <namespace> port-forward svc/<release>-om 9874:9874
# Recon UI
kubectl -n <namespace> port-forward svc/<release>-recon 9888:9888
Uninstallation
helm uninstall <release> -n <namespace>