Kubernetes

Container Orchestration

Kubernetes (often abbreviated as K8s) is an open-source container orchestration platform designed to automate the deployment, scaling, and management of containerized applications.

Originally developed by Google and now maintained by the Cloud Native Computing Foundation (CNCF), Kubernetes has become the de facto standard for container orchestration in production environments.

In the context of TDP Kubernetes, the Kubernetes platform is used as the foundation for deploying all components of the cloud-native edition of TDP, replacing the traditional bare-metal and virtual machine approach used in the TDP Datacenter edition.

Why Kubernetes?

Adopting Kubernetes as the deployment platform for TDP brings several benefits:

Automatic scalability: components can scale horizontally according to demand, without manual intervention
High availability: Kubernetes ensures that components are automatically redistributed in the event of a node failure
Declarative deployment: the entire infrastructure is described as code (YAML), enabling versioning and reproducibility
Resource isolation: each component runs in its own container, with defined CPU and memory limits
Zero-downtime updates: rolling updates and rollbacks are native to the platform
Rich ecosystem: integration with tools such as Helm, ArgoCD, Prometheus, and Grafana

Fundamental Concepts

Pods

The Pod is the smallest deployment unit in Kubernetes.

A Pod encapsulates one or more containers that share the same network namespace and storage.

In TDP Kubernetes, each component (such as Kafka, Trino, or Airflow) runs in one or more Pods.

Element	Description
Containers	One or more containers in the same Pod (shared resource)
Network	A single IP per Pod; containers communicate via `localhost`
Storage	Volumes shared between the Pod's containers

Deployments and StatefulSets

Kubernetes uses controllers to keep Pods running, recreate them in case of failure, and apply updates in a controlled way.

Deployment: used for workloads where Pods are interchangeable. A Pod can be replaced by another one without preserving the same name, IP address, or dedicated volume. It is commonly used by components such as Trino, Superset, CloudBeaver, and parts of Airflow, depending on the chart architecture.
StatefulSet: used for workloads that require stable identity. Each Pod keeps a predictable name, its own network identity, and, when configured, an associated persistent volume. It is commonly used by components such as Kafka, PostgreSQL, ClickHouse, and NiFi.

Aspect	Deployment	StatefulSet
Main use case	Interchangeable Pods	Pods with stable identity
Pod identity	Not preserved across recreations	Preserved across recreations
Pod names	Automatically generated	Ordered and predictable, such as `name-0`, `name-1`
Storage	Can use volumes, but there is no fixed per-Pod binding	Can keep a persistent volume associated with each Pod
Scaling	Replicas are equivalent to each other	Replicas have their own identity and controlled ordering
TDP examples	Trino, Superset, CloudBeaver, parts of Airflow	Kafka, PostgreSQL, ClickHouse, NiFi

Services

Services expose Pods for internal (ClusterIP) or external (NodePort, LoadBalancer, Ingress) communication. In TDP Kubernetes, each component has its own Service for discovery and communication between components.

Type	Use
ClusterIP	Access only within the cluster (default); used for communication between components (e.g. Trino → Hive Metastore).
NodePort	Exposes the application on a port on each node; useful for testing or clusters without LoadBalancer.
LoadBalancer	Exposes via the cloud provider's load balancer (or MetalLB on-premise).
Ingress / Gateway API	HTTP/HTTPS routing by host and path; used for UIs (Airflow, Superset, ArgoCD, etc.).

Ingress and Gateway API

TDP Kubernetes can be exposed over HTTP/HTTPS using two approaches:

Ingress or Gateway API.

The Ingress API remains available and supported in Kubernetes, but it is frozen: there is no planned functional evolution for this API. New HTTP/HTTPS routing functionalities are being targeted at the Gateway API.

It is important to differentiate the Ingress API from its controllers. The retirement announced for March 2026 refers to the ingress-nginx community controller, not the removal of the Ingress API from Kubernetes.

For this reason, TDP Kubernetes documents both approaches. Environments that already use Ingress can continue using this option, provided they adopt a compatible and maintained controller. For new deployments or architecture evolutions, the Gateway API should be considered the preferred approach.

Namespaces

Namespaces provide logical isolation within the cluster. TDP Kubernetes uses a single namespace to group components; in the documentation this namespace is indicated as <NAMESPACE>.

note

The <NAMESPACE> is defined by the user. It may be, for example, tdp-ingestao, tdp-financeiro, departamento-pessoal, or any other name. The -project suffix in examples such as tdp-project is only illustrative in this documentation; the user chooses the name they want.

Persistent Volumes

Persistent Volumes (PV) and Persistent Volume Claims (PVC) provide persistent storage for components that need to retain data across restarts, such as databases (PostgreSQL, ClickHouse) and messaging systems (Kafka).

The PVC is the storage request made by the Pod (for example, "I need 10 Gi in ReadWriteOnce"); the cluster fulfils this request by binding the PVC to an existing PV or by dynamically provisioning a new volume through a StorageClass (for example, local-path or cloud provisioners).

The figure below illustrates this flow: the Pod references a PVC; the PVC is fulfilled by the StorageClass, which provisions the volume on the node's physical disk (or on external storage, depending on the configured StorageClass).

Persistent Storage Flow in Kubernetes.

ConfigMaps and Secrets

ConfigMaps: store non-sensitive configuration in key-value format, used to parameterize components
Secrets: store sensitive data such as passwords, tokens, and certificates in encrypted form

Resource	Typical content	Use in TDP
ConfigMap	URLs, endpoints, flags, configuration files	Connection parameters (e.g. Trino catalogs), non-sensitive configs
Secret	Passwords, API keys, TLS certificates	Database credentials, OCI registry, LDAP, tokens

TDP Kubernetes Deployment Model

Helm Charts

Helm is the Kubernetes package manager, and TDP Kubernetes uses Helm Charts to package, configure, and install each platform component.

Each chart contains:

Kubernetes resource templates (Deployments, Services, ConfigMaps, etc.)
Configurable default values (values.yaml)
Dependency declarations between charts
Metadata and documentation

In TDP Kubernetes, Helm is generally adopted for components whose lifecycle can be adequately handled by native Kubernetes resources, with less need for specialized operational automation.

TDP Kubernetes charts are available in the Tecnisys OCI registry (registry.tecnisys.com.br) and follow the naming convention tdp-<component> (for example, tdp-kafka, tdp-trino, tdp-airflow).

OCI Registry

In TDP Kubernetes, artifacts such as Helm Charts and container images can be distributed through an OCI-compatible registry. This model standardizes the storage, versioning, and distribution of packages used by the platform.

Using a centralized registry facilitates:

controlled publication and distribution of artifacts;
consistent versioning of charts and images;
integration with installation and automation flows;
support for CLI-based and GitOps approaches.

Details on authentication, registry access, and operational commands should be consulted in the installation sections.

Operators

Operators extend Kubernetes with application-specific operational logic. They observe the actual state of resources in the cluster, compare it with the desired configuration, and perform reconciliation actions when required.

In practice, an Operator automates tasks such as creating derived resources, performing updates, recovering from failures, adjusting configuration, managing replicas, handling persistent resources, and maintaining the application state.

In TDP Kubernetes, Operators are used by components whose architecture requires specialized reconciliation through Custom Resources. In these cases, Helm remains the mechanism for installing and parametrizing the chart, while the Operator is responsible for keeping the application resources reconciled in the cluster.

This approach is adopted, for example, by components such as Kafka, NiFi, and ClickHouse, where the application lifecycle is managed through Custom Resources and specialized controllers.

The CRDs required by components based on Custom Resources are centralized in the tdp-crds chart, which must be installed before charts that create those resources.

In OpenShift environments, Operators follow the same Kubernetes model. OpenShift acts as the cluster execution platform, while the Operator performs reconciliation of the application resources. When the platform requires specific security or runtime configurations, these adaptations are handled by the component chart, such as settings related to SCC, RoleBindings, arbitrary UID execution, and securityContext.

TDP Operator Automation Pattern.

Deployment Strategy: Helm and Operators

TDP Kubernetes uses Helm as the main mechanism for packaging, configuring, and installing platform components.

Helm charts define the Kubernetes resources required by each component, such as Deployments, StatefulSets, Services, Secrets, ConfigMaps, Jobs, Ingresses, HTTPRoutes, RBAC, and, for Operator-based components, Custom Resources.

For components where TDP Kubernetes adopts Operators, the Helm chart delivers the required resources and parametrizes the installation, while the Operator keeps the application state reconciled in the cluster.

In general:

Helm is the installation and configuration mechanism for TDP Kubernetes charts;
specific components use Operators and Custom Resources as part of their architecture;
the CRDs used by these components are installed beforehand through the tdp-crds chart;
in OpenShift, deployment follows the Kubernetes model, with security and runtime adaptations handled by the component charts when required.

Therefore, Helm and Operators act in a complementary way in TDP Kubernetes: Helm delivers the parametrized installation, and the Operator, for components that use it, performs the operational reconciliation of the application.

ArgoCD and GitOps

GitOps

GitOps is an operational model in which the desired state of the environment is defined declaratively in Git repositories. Changes become versioned, auditable, and continuously reconciled in the cluster.

This approach favours greater operational predictability, change traceability, and reduction of direct manual changes to the environment.

GitOps Principles

The core principles of GitOps include:

Single source of truth: the desired state of the platform is maintained in Git repositories;
Declarative configuration: resources are defined by versionable manifests and files;
Versioning and traceability: every change is recorded and auditable;
Automatic reconciliation: the environment is continuously compared with the desired state and adjusted when necessary;
Predictable operation: deployments, changes, and rollbacks follow a controlled and reproducible flow.

ArgoCD

ArgoCD is a declarative delivery tool based on GitOps principles. In this model, the desired application state is described in Git repositories, and the environment is continuously reconciled with that state.

In TDP Kubernetes, this approach enables:

change traceability;
greater deployment predictability;
declarative control of environments;
reduction of direct manual changes to the cluster.

TDP Kubernetes Architecture

The TDP Kubernetes architecture can be viewed from two complementary perspectives.

The first presents an overview of the solution deployment, including the origin of artifacts, the Kubernetes cluster, exposure mechanisms, and integration with storage.

The second details the functional organization of the platform components, grouping services by logical responsibility layers.

The TDP Kubernetes edition organizes its components into logical layers, according to the function performed by each service within the platform.

TDP Kubernetes Architecture.

Layer	Components	Function
Platform Infrastructure	ArgoCD, PostgreSQL	Platform operation support, management, and shared services
Messaging	Apache Kafka	Event streaming and data integration
Metadata and Table Storage	Hive Metastore, Delta Lake, Apache Iceberg	Metadata and open table formats
Object Storage	Apache Ozone	Distributed object storage, S3-compatible alternative for on-premise environments
Security	Apache Ranger	Access control and security policies
Ingestion	Apache NiFi	Data ingestion and transformation
Processing	Apache Spark, Trino	Batch processing, streaming, and SQL queries
Orchestration and Development	Apache Airflow, JupyterHub	Pipeline orchestration and notebooks
Analytics and Access	ClickHouse, Apache Superset, CloudBeaver	OLAP, visualization, and data administration
Governance	OpenMetadata	Data catalog and lineage

Platform Operation

The TDP Kubernetes platform can be understood in a flow from the definition of the desired state to the execution of components and the final consumption of deployed services.

1. Control and Deployment

Deployment and environment management can take place through different approaches, depending on the operational model adopted:

TDPCTL (recommended): automation tool used to reduce manual configuration and deployment steps.
Argo CD with Git: GitOps flow in which the desired state of the environment is maintained in a Git repository and continuously reconciled in the cluster.
Deploy from versioned artifacts: components can also be deployed from charts and images made available in an OCI-compatible registry.

note

The operational details of these approaches are described in the installation and configuration sections.

2. Artifact Distribution

Container images and Helm Charts for the platform are distributed through an OCI-compatible registry.

This model enables:

centralized versioning of artifacts;
standardized distribution of images and charts;
integration with automated installation and GitOps flows;
access control according to the environment.

note

Registry address, authentication, SSO, and access procedures should be addressed in the prerequisites and installation pages.

3. Component Execution

After deployment, the Kubernetes cluster runs the platform components as containerized workloads, using native scheduling, network, scalability, and persistence mechanisms.

In this model:

stateless components tend to be managed by resources such as Deployments;
stateful components tend to use StatefulSets and persistent volumes;
communication between components takes place through Services;
exposure of web interfaces can be done through mechanisms such as Gateway API or other implementations supported by the environment.

Kubernetes deployment model.

4. Data Persistence and Access

The platform's data layer may use external storage services or resources provided by the solution itself, depending on the environment architecture.

In general:

S3-compatible services may be used when they already exist in the infrastructure;
in on-premise scenarios or without a compatible external storage, Apache Ozone can act as an object storage layer;
components that require persistence use Persistent Volumes and Persistent Volume Claims, according to the StorageClass available in the cluster.

This model preserves deployment flexibility and compatibility between the components of the stack.

Next Steps

To continue with the adoption or operation of TDP Kubernetes, see also:

Kubernetes

Container Orchestration

Why Kubernetes?​

Fundamental Concepts​

Pods​

Deployments and StatefulSets​

Services​

Ingress and Gateway API​

Namespaces​

Persistent Volumes​

ConfigMaps and Secrets​

TDP Kubernetes Deployment Model​

Helm Charts​

OCI Registry​

Operators​

Deployment Strategy: Helm and Operators​

ArgoCD and GitOps​

GitOps​

GitOps Principles​

ArgoCD​

TDP Kubernetes Architecture​

Platform Operation​

1. Control and Deployment​

2. Artifact Distribution​

3. Component Execution​

4. Data Persistence and Access​

Next Steps​

Why Kubernetes?

Fundamental Concepts

Pods

Deployments and StatefulSets

Services

Ingress and Gateway API

Namespaces

Persistent Volumes

ConfigMaps and Secrets

TDP Kubernetes Deployment Model

Helm Charts

OCI Registry

Operators

Deployment Strategy: Helm and Operators

ArgoCD and GitOps

GitOps

GitOps Principles

ArgoCD

TDP Kubernetes Architecture

Platform Operation

1. Control and Deployment

2. Artifact Distribution

3. Component Execution

4. Data Persistence and Access

Next Steps