Airflow configuration
See Apache Airflow — Concepts for a full overview of the tool, its architecture, and how it works.
In the component configuration file, Airflow options are grouped under the tdp-airflow key.
The project packages Apache Airflow 3.0.2 for Kubernetes, with KubernetesExecutor as the default executor.
Default configuration
This section describes the starting point for installing Airflow in TDP Kubernetes. The goal is to record the component’s initial behavior before detailing database, authentication, persistence, and integration settings.
Default behavior of the tdp-airflow chart:
By default, the chart favors a working install with minimal external dependencies: Kubernetes executor, metadata in local PostgreSQL, and DAGs on a persistent volume. Revising these defaults depends on the operational policy adopted for database, storage, authentication, and observability, as described in General configuration.
- Executor:
KubernetesExecutor - Database: Bundled PostgreSQL (subchart), with
tdp-airflow.postgresql.enabled: trueby default - DAGs: PVC persistence enabled by default (
tdp-airflow.dags.persistence.enabled: true, typical size5Gi) - Logs: PVC persistence disabled by default (
tdp-airflow.logs.persistence.enabled: false); enable only if the StorageClass meets the required access pattern (usually RWX)
The command below is the usual way to install or upgrade Airflow. The meu-values.yaml file holds only the settings required for the environment, without changing the basic procedure.
helm upgrade --install <release> oci://registry.tecnisys.com.br/tdp/charts/tdp-airflow \
-n <namespace> --create-namespace \
-f meu-values.yaml
The first install can take several minutes (images, migrations, hooks). Use --wait with an adequate timeout (for example --timeout 15m) to avoid premature Helm failure.
Access
Access to the Airflow web UI is provided by a Kubernetes Service, usually of type ClusterIP.
For local access during validation or testing:
kubectl -n <namespace> port-forward svc/<release>-api-server 8080:8080
Database configuration
Airflow needs a relational database to store its metadata: registered DAGs, run history, connections, variables, and users.
Without a stable database, Airflow cannot keep the history and metadata that make the service usable day to day.
Therefore, choosing between bundled and external PostgreSQL is one of the first configuration decisions:
- bundled PostgreSQL simplifies the initial install;
- external PostgreSQL is usually preferred when the environment already has its own backup, availability, and database administration standards.
To understand when to use each option, see Internal versus external PostgreSQL on the General configuration page.
Bundled PostgreSQL (default)
Suitable only for development and testing:
tdp-airflow:
postgresql:
enabled: true
For production, prefer external PostgreSQL.
External PostgreSQL
Disable the bundled database and provide the external PostgreSQL connection details. In practice, this is the most common scenario when the customer already uses platform PostgreSQL or a separately managed database service.
For TDP integration helpers:
tdp-airflow:
postgresql:
enabled: false
data:
metadataSecretName: "<release>-airflow-database"
metadataConnection:
user: airflow
pass: ""
protocol: postgresql
host: "<postgres-service>.<namespace>.svc.cluster.local"
port: 5432
db: airflow
sslmode: disable
TDPConfigurations:
externalDatabase:
enabled: true
recreate: false
externalSecret:
releaseName: "<tdp-postgresql-release>"
area: "<area>"
With TDPConfigurations.externalDatabase.enabled: true, the chart uses the supplied release and area to align database, user, and metadata Secret with the TDP stack.
Prefer setting metadataSecretName and leaving pass empty when TDP jobs are responsible for generating the connection Secret.
LDAP authentication
LDAP is optional and is off by default.
While it is off, there is no dependency on a corporate directory or extra Secrets for that purpose. When LDAP is enabled, Airflow authentication depends on the organization’s directory settings—server, search base, users, and sensitive values stored in a Secret.
Configuration uses tdp-airflow.ldap and variables injected via tdp-airflow.extraEnv.
See Security — Airflow for bind Secret and LDAP configuration examples.
DAG persistence (PVC)
DAGs must be available to the components that orchestrate and run workflows. This section is less about “storing files” and more about ensuring Airflow can find DAGs reliably in the environment.
By default the chart already creates a PVC for DAGs. Adjust size, StorageClass, or access mode to match the cluster:
tdp-airflow:
dags:
persistence:
enabled: true
size: 5Gi
storageClassName: ""
accessMode: ReadWriteOnce
With KubernetesExecutor, several pods need to read the same DAGs.
If the scheduler, webserver, and tasks do not share the volume, use a StorageClass with ReadWriteMany (RWX), as allowed by the environment.
Log persistence (PVC)
By default logs are not on a shared PVC. That simplifies the initial install and avoids requiring a specific storage type on first use.
In practice, with enabled: false, log retention depends on the observability strategy already adopted in the environment. With enabled: true, the cluster must provide a volume compatible with writes from multiple components.
Chart default: enabled: false. If you enable it, evaluate volume access mode: logs are written by several components and RWO is often unsuitable.
tdp-airflow:
logs:
persistence:
enabled: false
size: 10Gi
storageClassName: ""
With RWO, log persistence is often unsuitable.
Prefer RWX or another log strategy (for example an external stack) aligned with the environment.
S3 connection (TDP)
Configure this section when Airflow must access the cluster object store—for example so DAGs live in Apache Ozone instead of a PVC, or so Airflow operators read and write files on S3-compatible storage.
To understand the concept and when an S3 connection is needed, see S3-compatible storage (Ozone) on the General configuration page.
With TDPConfigurations.s3Connection.enabled: true, the chart creates a Secret with S3-compatible connection parameters for TDP integrations. Use placeholders; do not commit credentials in plain text:
TDPConfigurations:
s3Connection:
enabled: true
secretName: "<s3-connection-secret>"
name: "<connection-name>"
type: "aws"
accessKey: "<s3-access-key>"
secretKey: "<s3-secret-key>"
uri: "https://<s3-endpoint>"
Additional environment variables
Use tdp-airflow.extraEnv to reference Secrets or inject other variables (for example LDAP password):
tdp-airflow:
extraEnv: |
- name: EXAMPLE_ENV
valueFrom:
secretKeyRef:
name: <secret-name>
key: <secret-key>
Python dependencies and image
With KubernetesExecutor, each task runs in a new pod.
Installing packages via an init container on every deploy can greatly increase startup time.
The usual production approach is a custom image with dependencies preinstalled; set the same image in tdp-airflow.images.airflow and tdp-airflow.images.pod_template.
For more options (hooks, images), see the chart’s exported values (helm show values) and the upstream Apache Airflow chart documentation.
Main parameters
The table below summarizes the parameters most often consulted when configuring Airflow. Use it as a quick reference to review what was changed from the component default.
On a first pass, the areas that usually deserve the most attention are: database, DAG and log persistence, LDAP authentication, and integrations with shared platform services.
| Parameter | Description | Default |
|---|---|---|
tdp-airflow.enabled | Enables the Airflow deploy | true |
tdp-airflow.executor | Executor | KubernetesExecutor |
tdp-airflow.config.core.default_timezone | Default timezone | America/Sao_Paulo |
tdp-airflow.apiServer.service.type | Type of Service for the Airflow web UI | ClusterIP |
tdp-airflow.postgresql.enabled | Bundled PostgreSQL | true |
tdp-airflow.dags.persistence.enabled | DAG PVC | true |
tdp-airflow.dags.persistence.size | DAG PVC size | 5Gi |
tdp-airflow.logs.persistence.enabled | Log PVC | false |
tdp-airflow.logs.persistence.size | Log PVC size | 10Gi |
tdp-airflow.ldap.enabled | LDAP | false |
tdp-airflow.ldap.apiServerConfig | Flask-AppBuilder snippet | See helm show values for the version in use |
tdp-airflow.extraEnv | Extra variables (e.g. Secret) | "" |
tdp-airflow.data | Metadata / external DB | Off by default; structure in helm show values |
TDPConfigurations.externalDatabase.enabled | TDP external DB helpers | false |
TDPConfigurations.s3Connection.enabled | TDP S3 connection Secret | false |