Skip to main content
Version 3.0.0

Airflow configuration

Learn more

See Apache Airflow — Concepts for a full overview of the tool, its architecture, and how it works.

In the component configuration file, Airflow options are grouped under the tdp-airflow key.

The project packages Apache Airflow 3.0.2 for Kubernetes, with KubernetesExecutor as the default executor.

Default configuration

This section describes the starting point for installing Airflow in TDP Kubernetes. The goal is to record the component’s initial behavior before detailing database, authentication, persistence, and integration settings.

Default behavior of the tdp-airflow chart:

By default, the chart favors a working install with minimal external dependencies: Kubernetes executor, metadata in local PostgreSQL, and DAGs on a persistent volume. Revising these defaults depends on the operational policy adopted for database, storage, authentication, and observability, as described in General configuration.

  • Executor: KubernetesExecutor
  • Database: Bundled PostgreSQL (subchart), with tdp-airflow.postgresql.enabled: true by default
  • DAGs: PVC persistence enabled by default (tdp-airflow.dags.persistence.enabled: true, typical size 5Gi)
  • Logs: PVC persistence disabled by default (tdp-airflow.logs.persistence.enabled: false); enable only if the StorageClass meets the required access pattern (usually RWX)

The command below is the usual way to install or upgrade Airflow. The meu-values.yaml file holds only the settings required for the environment, without changing the basic procedure.

Terminal input
helm upgrade --install <release> oci://registry.tecnisys.com.br/tdp/charts/tdp-airflow \
-n <namespace> --create-namespace \
-f meu-values.yaml
Tip

The first install can take several minutes (images, migrations, hooks). Use --wait with an adequate timeout (for example --timeout 15m) to avoid premature Helm failure.

Access

Access to the Airflow web UI is provided by a Kubernetes Service, usually of type ClusterIP.

For local access during validation or testing:

Terminal input
kubectl -n <namespace> port-forward svc/<release>-api-server 8080:8080

Database configuration

Airflow needs a relational database to store its metadata: registered DAGs, run history, connections, variables, and users.

Without a stable database, Airflow cannot keep the history and metadata that make the service usable day to day.

Therefore, choosing between bundled and external PostgreSQL is one of the first configuration decisions:

  • bundled PostgreSQL simplifies the initial install;
  • external PostgreSQL is usually preferred when the environment already has its own backup, availability, and database administration standards.

To understand when to use each option, see Internal versus external PostgreSQL on the General configuration page.

Bundled PostgreSQL (default)

Suitable only for development and testing:

tdp-airflow:
postgresql:
enabled: true
Note

For production, prefer external PostgreSQL.

External PostgreSQL

Disable the bundled database and provide the external PostgreSQL connection details. In practice, this is the most common scenario when the customer already uses platform PostgreSQL or a separately managed database service.

For TDP integration helpers:

tdp-airflow:
postgresql:
enabled: false

data:
metadataSecretName: "<release>-airflow-database"
metadataConnection:
user: airflow
pass: ""
protocol: postgresql
host: "<postgres-service>.<namespace>.svc.cluster.local"
port: 5432
db: airflow
sslmode: disable

TDPConfigurations:
externalDatabase:
enabled: true
recreate: false
externalSecret:
releaseName: "<tdp-postgresql-release>"
area: "<area>"

With TDPConfigurations.externalDatabase.enabled: true, the chart uses the supplied release and area to align database, user, and metadata Secret with the TDP stack.

Tip

Prefer setting metadataSecretName and leaving pass empty when TDP jobs are responsible for generating the connection Secret.

LDAP authentication

LDAP is optional and is off by default.

While it is off, there is no dependency on a corporate directory or extra Secrets for that purpose. When LDAP is enabled, Airflow authentication depends on the organization’s directory settings—server, search base, users, and sensitive values stored in a Secret.

Configuration uses tdp-airflow.ldap and variables injected via tdp-airflow.extraEnv.

Detailed documentation

See Security — Airflow for bind Secret and LDAP configuration examples.

DAG persistence (PVC)

DAGs must be available to the components that orchestrate and run workflows. This section is less about “storing files” and more about ensuring Airflow can find DAGs reliably in the environment.

By default the chart already creates a PVC for DAGs. Adjust size, StorageClass, or access mode to match the cluster:

tdp-airflow:
dags:
persistence:
enabled: true
size: 5Gi
storageClassName: ""
accessMode: ReadWriteOnce
Access mode

With KubernetesExecutor, several pods need to read the same DAGs.

If the scheduler, webserver, and tasks do not share the volume, use a StorageClass with ReadWriteMany (RWX), as allowed by the environment.

Log persistence (PVC)

By default logs are not on a shared PVC. That simplifies the initial install and avoids requiring a specific storage type on first use.

In practice, with enabled: false, log retention depends on the observability strategy already adopted in the environment. With enabled: true, the cluster must provide a volume compatible with writes from multiple components.

Chart default: enabled: false. If you enable it, evaluate volume access mode: logs are written by several components and RWO is often unsuitable.

tdp-airflow:
logs:
persistence:
enabled: false
size: 10Gi
storageClassName: ""
Note

With RWO, log persistence is often unsuitable.

Prefer RWX or another log strategy (for example an external stack) aligned with the environment.

S3 connection (TDP)

Configure this section when Airflow must access the cluster object store—for example so DAGs live in Apache Ozone instead of a PVC, or so Airflow operators read and write files on S3-compatible storage.

To understand the concept and when an S3 connection is needed, see S3-compatible storage (Ozone) on the General configuration page.

With TDPConfigurations.s3Connection.enabled: true, the chart creates a Secret with S3-compatible connection parameters for TDP integrations. Use placeholders; do not commit credentials in plain text:

TDPConfigurations:
s3Connection:
enabled: true
secretName: "<s3-connection-secret>"
name: "<connection-name>"
type: "aws"
accessKey: "<s3-access-key>"
secretKey: "<s3-secret-key>"
uri: "https://<s3-endpoint>"

Additional environment variables

Use tdp-airflow.extraEnv to reference Secrets or inject other variables (for example LDAP password):

tdp-airflow:
extraEnv: |
- name: EXAMPLE_ENV
valueFrom:
secretKeyRef:
name: <secret-name>
key: <secret-key>

Python dependencies and image

With KubernetesExecutor, each task runs in a new pod.

Installing packages via an init container on every deploy can greatly increase startup time.

The usual production approach is a custom image with dependencies preinstalled; set the same image in tdp-airflow.images.airflow and tdp-airflow.images.pod_template.

For more options (hooks, images), see the chart’s exported values (helm show values) and the upstream Apache Airflow chart documentation.

Main parameters

The table below summarizes the parameters most often consulted when configuring Airflow. Use it as a quick reference to review what was changed from the component default.

On a first pass, the areas that usually deserve the most attention are: database, DAG and log persistence, LDAP authentication, and integrations with shared platform services.

ParameterDescriptionDefault
tdp-airflow.enabledEnables the Airflow deploytrue
tdp-airflow.executorExecutorKubernetesExecutor
tdp-airflow.config.core.default_timezoneDefault timezoneAmerica/Sao_Paulo
tdp-airflow.apiServer.service.typeType of Service for the Airflow web UIClusterIP
tdp-airflow.postgresql.enabledBundled PostgreSQLtrue
tdp-airflow.dags.persistence.enabledDAG PVCtrue
tdp-airflow.dags.persistence.sizeDAG PVC size5Gi
tdp-airflow.logs.persistence.enabledLog PVCfalse
tdp-airflow.logs.persistence.sizeLog PVC size10Gi
tdp-airflow.ldap.enabledLDAPfalse
tdp-airflow.ldap.apiServerConfigFlask-AppBuilder snippetSee helm show values for the version in use
tdp-airflow.extraEnvExtra variables (e.g. Secret)""
tdp-airflow.dataMetadata / external DBOff by default; structure in helm show values
TDPConfigurations.externalDatabase.enabledTDP external DB helpersfalse
TDPConfigurations.s3Connection.enabledTDP S3 connection Secretfalse