Skip to main content
Version Next

Apache Ozone Configuration

The tdp-ozone chart deploys Apache Ozone — S3-compatible object storage — on Kubernetes for TDP.

What is Apache Ozone?

Apache Ozone is a distributed object storage system compatible with the Amazon S3 API.

In TDP Kubernetes, it plays the role of S3: it is where data is stored — Parquet files, Delta Lake tables, Iceberg tables, logs, models.

The main advantage of Ozone over using Amazon S3 directly is that it runs inside the Kubernetes cluster, eliminating dependency on external services and reducing latency and data transfer costs.

Learn more

See Apache Ozone — Concepts for a complete overview of the tool, its architecture and how it works.

How Ozone fits into TDP

Ozone is used as storage by Spark (via the S3A protocol), by Trino (via the native S3 connector), and optionally by ClickHouse.

From the applications' perspective, Ozone is transparent: they point to s3a://bucket/path and the S3 protocol handles the rest.

Ozone architecture

Ozone is composed of four services that must be available for the cluster to function:

ServiceRolePort
S3 Gateway (REST)S3-compatible endpoint for reading and writing data9878
S3 Gateway (Web UI)S3 Gateway monitoring web interface19878
Ozone Manager (OM)Manages namespaces, volumes, buckets, and metadata9874
Storage Container Manager (SCM)Manages where data blocks reside on datanodes9876
ReconAdministrative and cluster monitoring interface9888

Datanodes are the nodes that actually store the data. The number of datanodes (ozone.datanode.replicas) determines the storage capacity and supported replication factor.

Installation (OCI)

Terminal input
helm upgrade --install <release> \
oci://registry.tecnisys.com.br/tdp/charts/tdp-ozone \
-n <namespace> --create-namespace \
-f meu-values.yaml

Components

Apache Ozone is composed of four services, each with its own Service and optional Ingress:

ServiceDescriptionDefault port
S3 Gateway REST (s3g-rest)S3-compatible endpoint for data operations9878
S3 Gateway Web UI (s3g-web)S3 Gateway web interface19878
Ozone Manager (om)Namespace and metadata manager9874
SCM (scm)Storage Container Manager9876
Recon (recon)Monitoring and administration interface9888

S3 Authentication

The S3 Gateway uses AWS Signature v4 authentication (simple mode, without Kerberos).

Create credentials Secret

Terminal input
kubectl -n <namespace> create secret generic ozone-s3-credentials \
--from-literal=aws_access_key_id="<AWS_ACCESS_KEY_ID>" \
--from-literal=aws_secret_access_key="<AWS_SECRET_ACCESS_KEY>"

Reference the Secret in the Helm values file, for example:

ozone:
s3g:
auth:
enabled: true
secretName: "ozone-s3-credentials"

Configure AWS CLI for access

Terminal input
aws configure set aws_access_key_id <AWS_ACCESS_KEY_ID>
aws configure set aws_secret_access_key <AWS_SECRET_ACCESS_KEY>
aws configure set region us-east-1

# Test (use the Ingress host or the endpoint you exposed)
aws s3 ls --endpoint-url=http://<s3-rest-host>

Persistence

Enable persistence for the components that require storage:

ozone:
datanode:
persistence:
enabled: true
storageClassName: "<storage-class>"
size: 100Gi

om:
persistence:
enabled: true
storageClassName: "<storage-class>"
size: 10Gi

scm:
persistence:
enabled: true
storageClassName: "<storage-class>"
size: 10Gi

Main parameters

ParameterDescriptionDefault
ozone.enabledEnable Ozonetrue
ozone.image.tagOzone image version2.0.0
ozone.datanode.replicasNumber of datanodes3
ozone.om.replicasOzone Manager replicas1
ozone.scm.replicasSCM replicas1
ozone.recon.replicasRecon replicas1
ozone.s3g.replicasS3 Gateway replicas1
ozone.s3g.auth.enabledEnable S3 authenticationtrue
ozone.s3g.auth.secretNameName of the Secret with S3 credentialsozone-s3-credentials
ingress.s3g.rest.enabledEnable S3 Gateway REST Ingresstrue
ingress.s3g.web.enabledEnable S3 Gateway Web UI Ingresstrue
ingress.om.enabledEnable Ozone Manager Ingresstrue
ingress.scm.enabledEnable SCM Ingresstrue
Ingress

See the Ingress — Apache Ozone page to configure external exposure for each service.

Integrations

Apache Ozone is used as S3-compatible storage by other TDP components. See Integrations — Apache Ozone to configure Spark and Trino with Ozone.

Access via port-forward

Terminal input
# S3 Gateway REST
kubectl -n <namespace> port-forward svc/<release>-s3g-rest 9878:9878

# Ozone Manager UI
kubectl -n <namespace> port-forward svc/<release>-om 9874:9874

# Recon UI
kubectl -n <namespace> port-forward svc/<release>-recon 9888:9888

Uninstallation

Terminal input
helm uninstall <release> -n <namespace>