Skip to main content
Version Next

Installation via Argo CD (GitOps)

This section describes how to install TDP Kubernetes components using Argo CD as a GitOps tool.

tip

Before starting installation via Argo CD, ensure prerequisites are met and Argo CD is already installed on the cluster.
If Argo CD is not installed yet, follow steps 1 and 2 of Helm installation.

Overview

Argo CD is a continuous delivery (CD) tool based on the GitOps model.
In this model, you describe the infrastructure you want — which components, which version, which settings — in YAML files called manifests, and store them in a Git repository.
Argo CD watches that repository and keeps the Kubernetes cluster aligned with what is declared in those files.

Differences between installing with Argo CD and Helm

CriterionHelm CLIArgo CD (GitOps)
TraceabilityManual commands, no automatic historyEvery change is recorded in Git
AutomationManual execution per chartContinuous automatic sync
RollbackRe-running commandsOne click or one command
VisibilityTerminal onlyWeb UI with live status
AuditDifficultFull, via Git history

Essential concepts for installation with Argo CD

Before you start, it is important to understand the core concepts of this installation:

Manifest : A YAML file that describes a Kubernetes or Argo CD resource.
In this context, each manifest defines an Argo CD Application — i.e. it tells Argo CD to install a given Helm chart.
Files are created from the templates in this documentation and stored in the indicated Git repository.

Application : The central object in Argo CD.
An Application tells Argo CD to download a given Helm chart from the registry and install it in a given cluster namespace.
Each TDP component (Kafka, Airflow, Trino, etc.) is represented by a separate Application.

Repository : The source of artifacts Argo CD will use.
It can be a Git repository (where your manifests live) or an OCI registry1 (where Tecnisys Helm charts are hosted).
Both must be registered in Argo CD before creating Applications.

Sync : The process by which Argo CD compares the desired state in manifests with the actual cluster state.
If there is a drift, Argo CD applies the required changes — automatically (if configured) or on demand.

App of Apps : A pattern to manage all components with a single root Application.
Instead of applying many manifests one by one, you apply one file that tells Argo CD to discover and install all others automatically.

Installation order

Components should be installed in a specific order so dependencies are satisfied:

OrderChartComponentVersionDependencies
1tdp-postgresqlPostgreSQL17.5.0None
2tdp-kafkaKafka (Strimzi)4.1.0None
3tdp-hive-metastoreHive Metastore4.0.0None
4tdp-deltalakeDelta Lake4.0.0None
5tdp-icebergIceberg1.10.0None
6tdp-ozoneApache Ozone2.0.0None
7tdp-rangerRanger2.7.0tdp-postgresql
8tdp-nifiNiFi1.28.0None
9tdp-sparkSpark4.0.0None
10tdp-trinoTrino478None
11tdp-airflowAirflow3.0.2tdp-postgresql
12tdp-jupyterJupyterLab5.3.0None
13tdp-clickhouseClickHouse25.8.11.66None
14tdp-cloudbeaverCloudBeaver25.2.3tdp-postgresql
15tdp-openmetadataOpenMetadata1.9.11tdp-postgresql
16tdp-supersetSuperset5.0.0tdp-postgresql

The Version column refers to the component version (e.g. Kafka 4.1.0, Airflow 3.0.2). The Helm chart version is 3.0.0 for all TDP charts.

note

Argo CD (including the required CRDs) is a prerequisite for this procedure: the cluster must already have Argo CD running (see Helm installation).
The tdp-argo-crds and tdp-argo charts are not part of this sequence because they are not installed by this GitOps flow.

warning

tdp-postgresql is a dependency and must be running before components that use it, such as Airflow, OpenMetadata, Ranger, Superset, and CloudBeaver.
With the App of Apps pattern, Argo CD tries to create everything at once; those components may fail temporarily until PostgreSQL is ready — selfHeal2 will reconcile them automatically.


Step 1 – Connect the OCI registry to Argo CD

First, tell Argo CD where the TDP Helm charts are.
Charts are hosted in Tecnisys’s private OCI registry (registry.tecnisys.com.br).
To let Argo CD download them, register that address as a Helm repository with OCI support.
This registration does not install anything — it only allows Argo CD to use the registry as an artifact source.
Obtain access credentials from Tecnisys.

1. Open the Argo CD UI in your browser.
The URL depends on how HTTP exposure is configured in your environment.
If the environment uses Ingress, find the exposed host with:

Terminal input
kubectl get ingress -n argocd

Example output:

NAME                CLASS   HOSTS                      PORTS
tdp-argocd-server nginx argocd.<cluster-domain> 80
Figure 1 - Find the exposed host - get ingress
Figure 1 - Find the exposed host - get ingress

Open https://argocd.<cluster-domain> in the browser.

Figure 2 - Argo CD access
Figure 2 - Argo CD access
Gateway API

If the environment uses Gateway API instead of Ingress, find the exposed host with:

Terminal input
kubectl get httproute -n argocd

See Ingress vs Gateway API for more details on both supported approaches.

2. In the sidebar, go to Settings → Repositories.

3. Click Connect Repo and fill in:

FieldValue
Repository typeHelm
Repository URLregistry.tecnisys.com.br
Enable OCItrue (check the box)
Username<username>
Password<password>

4. Click Connect.
The repository should appear in the list with a successful connection status.

Figure 3 - Repo Connect
Figure 3 - Repo Connect

Step 2 – Create the GitOps repository

You need a Git repository to store Application manifests.
That repository is the source of truth for your environment: Argo CD watches it continuously and applies any change pushed to it.

note

Create this Git repository on your preferred platform (GitHub, GitLab, Gitea, etc.) and grant Argo CD read access.
The repository may be private — in that case, register it in Argo CD under Settings → Repositories as well, same as for the OCI registry, but choose type Git and provide credentials.

Organize the repository like this:

tdp-gitops/
├── app-of-apps.yaml # Root Application — apply this file to bootstrap everything
├── apps/ # One manifest per TDP component
│ ├── tdp-postgresql.yaml
│ ├── tdp-kafka.yaml
│ ├── tdp-hive-metastore.yaml
│ ├── tdp-deltalake.yaml
│ ├── tdp-iceberg.yaml
│ ├── tdp-ozone.yaml
│ ├── tdp-ranger.yaml
│ ├── tdp-nifi.yaml
│ ├── tdp-spark.yaml
│ ├── tdp-trino.yaml
│ ├── tdp-airflow.yaml
│ ├── tdp-jupyter.yaml
│ ├── tdp-clickhouse.yaml
│ ├── tdp-cloudbeaver.yaml
│ ├── tdp-openmetadata.yaml
│ └── tdp-superset.yaml
└── values/ # Optional per-component overrides
├── tdp-postgresql-values.yaml
├── tdp-kafka-values.yaml
└── ...
note

What you create: the YAML files under apps/ (Application manifests) and app-of-apps.yaml.
Templates are provided in Step 3.

The values/ files are optional and let you customize each component — passwords, resource limits, or other parameters.
If omitted, each component uses the chart defaults.

The Helm charts themselves are provided by Tecnisys in the OCI registry.
You do not need to create or modify the charts.

Step 3 – Create Application manifests

Each TDP Kubernetes component is a YAML file defining an Argo CD Application.
Copy the templates below into the matching files under apps/ in your repository, replacing <namespace> with your target namespace (e.g. tdp-production, tdp-dev, my-namespace).

Understand the Manifest

Before copying templates, understand each field:

FieldMeaning
metadata.nameApplication name in Argo CD (must be unique)
metadata.namespaceNamespace where Argo CD runs — usually argocd
spec.source.repoURLOCI registry URL where the chart is stored
spec.source.chartHelm chart name to install
spec.source.targetRevisionChart version — use 3.0.0 for all TDP charts
spec.source.helm.valueFilesOverride files (optional; your own files under values/)
spec.destination.serverTarget cluster — use https://kubernetes.default.svc for in-cluster
spec.destination.namespaceNamespace where the component is installed
syncPolicy.automated.pruneIf true, removes cluster resources deleted from the manifest
syncPolicy.automated.selfHealIf true, reverts manual changes made directly in the cluster
syncOptions.CreateNamespaceCreates the namespace if it does not exist
Figure 4 - Creating a manifest (PostgreSQL example)
Figure 4 - Creating a manifest (PostgreSQL example)

1. PostgreSQL

PostgreSQL is the relational database used by several platform components: Airflow (DAG metadata and run history), OpenMetadata (data catalogue), Ranger (security policies), Superset (dashboard metadata), and CloudBeaver (settings). It must be running before those components.

apps/tdp-postgresql.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-postgresql
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-postgresql
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

2. Kafka (Strimzi)

Apache Kafka is the platform’s messaging and event streaming layer. It is operated by Strimzi, which automates broker and topic lifecycle in Kubernetes.

apps/tdp-kafka.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-kafka
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-kafka
targetRevision: 3.0.0
helm:
valueFiles:
- values.yaml
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

3. Hive Metastore

Hive Metastore stores table metadata (columns, types, partitions) for platform-managed tables; Spark and Trino use it to discover and query data.

apps/tdp-hive-metastore.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-hive-metastore
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-hive-metastore
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

4. Delta Lake

Delta Lake is an open storage layer that adds ACID transactions, versioning, and schema control on object storage, for reliable large-scale read/write.

apps/tdp-deltalake.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-deltalake
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-deltalake
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

5. Iceberg

Apache Iceberg is a high-performance open table format for large analytical datasets, with schema evolution, hidden partitioning, and time travel; widely used with Trino and Spark.

apps/tdp-iceberg.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-iceberg
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-iceberg
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

6. Apache Ozone

Apache Ozone is a distributed object store with an S3-compatible API; on-premises it can replace Amazon S3 or MinIO as primary data storage.

apps/tdp-ozone.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-ozone
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-ozone
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

7. Ranger

Apache Ranger provides security and access governance: central authorization for platform services and audit logs. Depends on PostgreSQL.

apps/tdp-ranger.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-ranger
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-ranger
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

8. NiFi

Apache NiFi is a data ingestion and routing tool with a visual UI for building pipelines that move, transform, and integrate data.

apps/tdp-nifi.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-nifi
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-nifi
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

9. Spark

Apache Spark is the platform’s distributed processing engine for batch jobs, large-scale transforms, and ML pipelines.

apps/tdp-spark.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-spark
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-spark
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

10. Trino

Trino is a distributed SQL query engine for analytical queries over data in Ozone/S3, Delta Lake, or Iceberg without moving data.

apps/tdp-trino.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-trino
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-trino
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

11. Airflow

Apache Airflow orchestrates data pipelines: create, schedule, and monitor DAGs for ingest, transform, and load. Depends on PostgreSQL.

apps/tdp-airflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-airflow
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-airflow
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

12. JupyterHub

JupyterHub is a multi-user notebook server so analysts and data scientists can run Jupyter notebooks in the cluster with access to data and compute.

apps/tdp-jupyter.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-jupyter
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-jupyter
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

13. ClickHouse

ClickHouse is the platform’s OLAP database, optimized for fast analytical queries on large datasets — ideal for dashboards and near-real-time reporting.

apps/tdp-clickhouse.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-clickhouse
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-clickhouse
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

14. CloudBeaver

CloudBeaver is a web UI to manage databases (PostgreSQL, ClickHouse, etc.) and run queries in the browser. Depends on PostgreSQL.

apps/tdp-cloudbeaver.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-cloudbeaver
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-cloudbeaver
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

15. OpenMetadata

OpenMetadata is the data catalogue and governance layer: discover, document, and manage tables, pipelines, dashboards, with lineage and quality. Depends on PostgreSQL.

apps/tdp-openmetadata.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-openmetadata
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-openmetadata
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

16. Superset

Apache Superset is the BI and visualization tool: interactive dashboards connected to ClickHouse, Trino, and other data stores. Depends on PostgreSQL.

apps/tdp-superset.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-superset
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.tecnisys.com.br/tdp
chart: tdp-superset
targetRevision: 3.0.0
destination:
server: https://kubernetes.default.svc
namespace: <namespace>
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

Step 4 – Create the App of Apps manifest

The App of Apps pattern lets you bootstrap the whole stack with a single command.
Instead of applying many manifests one by one, create a root Application (app-of-apps.yaml) pointing at the apps/ folder in your Git repo.
Argo CD reads that folder, discovers the manifests, and creates every Application defined there.

Create app-of-apps.yaml at the repository root.
Set repoURL to your Git repository URL (where you stored the files from Step 3):

app-of-apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tdp-stack
namespace: argocd
spec:
project: default
source:
repoURL: https://git.seu-servidor.com.br/seu-usuario/tdp-gitops.git
targetRevision: main
path: apps
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
note

spec.source.repoURL must point to the Git repo you created in Step 2 with the Application manifests. If the repo is private, register it first in Argo CD under Settings → Repositories.

Figure 5 - App of Apps manifest (example)
Figure 5 - App of Apps manifest (example)

Step 5 – Apply the App of Apps

With the GitOps repo and manifests ready, apply the root Application to start installing all components.

  1. Clone the Git repository on your machine
Terminal input
git clone https://git.seu-servidor.com.br/seu-usuario/tdp-gitops.git
Figure 6 - Git clone (example)
Figure 6 - Git clone (example)
  1. Apply the manifest:
Terminal input
cd tdp-gitops
kubectl apply -f app-of-apps.yaml
Figure 7 - Applying the manifest (example)
Figure 7 - Applying the manifest (example)

From then on, Argo CD will discover every manifest under apps/ and start installing all TDP Kubernetes components.

Track progress

Terminal input
argocd app list
Figure 8 - app list
Figure 8 - app list

Sync policies

The manifests in this documentation use syncPolicy.automated — Argo CD syncs automatically whenever it detects drift.
The two control flags are:

ParameterWhen true
pruneRemoves cluster resources that were removed from the Git manifest
selfHealReverts manual changes made outside Git

For stricter control, remove the automated block from the manifest.
In that case, start sync manually:

Terminal input
argocd app sync tdp-postgresql
Figure 9 - app sync
Figure 9 - app sync

Footnotes

  1. OCI (Open Container Initiative) is the modern standard for storing not only container images but also Helm packages. When registering the registry, you must enable Enable OCI — without it, Argo CD will treat the address as a conventional Helm repository based on index.yaml and the connection will fail.

  2. Automatic reconciliation with Git