Skip to main content
Version Next

Integrations — Kafka

Integration overview

The tdp-kafka chart supports integration with relational databases through Debezium connectors running on top of Kafka Connect.

What is Kafka Connect and Debezium?

Kafka Connect is a framework in the Kafka ecosystem for moving data between Kafka and external systems in a scalable and fault-tolerant manner, without writing code. Each integration is declared as a connector.

Debezium is a set of Kafka Connect connectors specialized in Change Data Capture (CDC): instead of periodically querying the database, Debezium reads the database transaction log (WAL in PostgreSQL, binlog in MySQL, etc.) and publishes each insert, update, or delete as a Kafka event — in near real-time and with minimal impact on the source database.

Database  →  transaction log  →  Debezium  →  Kafka Topic  →  consumers
Why use CDC?

CDC is superior to periodic polling because: it captures all changes (including deletions), operates in real time, does not overload the database with full-scan queries, and preserves the order of changes.

Supported databases

DatabaseChart key
PostgreSQLkafkaConnects.postgres
MySQLkafkaConnects.mysql
SQL ServerkafkaConnects.sqlserver
OraclekafkaConnects.oracle

Configuration structure

Each integration follows the same hierarchy:

kafkaConnects:
<type>:
enabled: true # enables the KafkaConnect cluster
connect: # Kafka Connect cluster settings
name: ...
replicas: ...
bootstrapServers: ...
kafkaVersion: ...
connectors: # one or more connectors in this cluster
<logical-name>:
enabled: true
name: ... # KafkaConnector resource name
topicPrefix: ... # prefix for generated topics
<type>: # database-specific parameters
...

Credential management

The chart offers two options for database credentials:

OptionParameterWhen to use
Create Secret automaticallycreateSecret: true + credentials in valuesEnvironments without external Secret management
Use existing SecretcreateSecret: false + secretName: <name>Environments with Vault, Sealed Secrets, or custom management
Credentials in values

When using createSecret: true, credentials are stored in the values applied to the release. Do not version password in public or shared Git repositories; use private values files or targeted --set flags in CI/CD pipelines.

PostgreSQL

Database prerequisite

Debezium uses PostgreSQL logical replication. Before enabling the connector:

  1. Configure wal_level = logical in PostgreSQL
  2. Create a user with replication permission and access to the monitored tables
  3. Enable logical publication (CREATE PUBLICATION) or let Debezium create it automatically
kafkaConnects:
postgres:
enabled: true
connect:
name: connect-debezium-postgres
replicas: 1
bootstrapServers: <kafka-cluster-name>-kafka-bootstrap:9092
kafkaVersion: "4.1.0"
connectors:
appdb:
enabled: true
name: debezium-postgres-appdb
topicPrefix: "pgapp"
postgres:
createSecret: true
host: "<postgresql-host>.<namespace>.svc.cluster.local"
port: "5432"
dbname: "appdb"
user: "debezium"
password: "<database-password>"

Generated topics follow the pattern: <topicPrefix>.<schema>.<table>. For example, with topicPrefix: pgapp, the table public.orders will generate the topic pgapp.public.orders.

MySQL

Database prerequisite

Debezium reads the MySQL binlog. Ensure that:

  1. log_bin is enabled in MySQL
  2. binlog_format = ROW
  3. The CDC user has the following permissions: SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT
kafkaConnects:
mysql:
enabled: true
connect:
name: connect-debezium-mysql
replicas: 1
bootstrapServers: <kafka-cluster-name>-kafka-bootstrap:9092
kafkaVersion: "4.1.0"
connectors:
appdb:
enabled: true
name: debezium-mysql-appdb
topicPrefix: "mysqlapp"
mysql:
createSecret: true
host: "<mysql-host>.<namespace>.svc.cluster.local"
port: "3306"
user: "debezium"
password: "<database-password>"
databaseIncludeList: "appdb"

The databaseIncludeList field restricts capture to the listed databases (comma-separated).

SQL Server

Database prerequisite

Debezium uses the SQL Server native CDC. For each database/table to monitor:

  1. Enable CDC on the database: EXEC sys.sp_cdc_enable_db
  2. Enable CDC on the tables: EXEC sys.sp_cdc_enable_table
  3. The CDC user must have read permission on the CDC tables (cdc.*)
kafkaConnects:
sqlserver:
enabled: true
connect:
name: connect-debezium-sqlserver
replicas: 1
bootstrapServers: <kafka-cluster-name>-kafka-bootstrap:9092
kafkaVersion: "4.1.0"
connectors:
appdb:
enabled: true
name: debezium-sqlserver-appdb
topicPrefix: "mssqlapp"
sqlserver:
createSecret: true
host: "<sqlserver-host>.<namespace>.svc.cluster.local"
port: "1433"
user: "debezium"
password: "<database-password>"
databaseNames: "AppDb"

The databaseNames field lists the SQL Server databases to monitor (comma-separated).

Oracle

Database prerequisite

Debezium uses Oracle's LogMiner to read the redo log. The following are required:

  1. Database in ARCHIVELOG mode
  2. LogMiner enabled and configured
  3. User with permissions: LOGMINING, SELECT_CATALOG_ROLE, access to monitored tables and LogMiner views
kafkaConnects:
oracle:
enabled: true
connect:
name: connect-debezium-oracle
replicas: 1
bootstrapServers: <kafka-cluster-name>-kafka-bootstrap:9092
kafkaVersion: "4.1.0"
connectors:
appdb:
enabled: true
name: debezium-oracle-appdb
topicPrefix: "oraapp"
oracle:
createSecret: true
host: "<oracle-host>.<namespace>.svc.cluster.local"
port: "1521"
user: "debezium"
password: "<database-password>"
dbname: "ORCLCDB"
pdbName: "ORCLPDB1"
FieldDescription
dbnameContainer Database (CDB) name
pdbNamePluggable Database (PDB) name where the monitored tables reside
Oracle and JDBC driver

The Oracle JDBC driver is not distributed with Debezium due to license restrictions. Check the chart image documentation to confirm how the driver is made available in your version of tdp-kafka.

Deploy

Terminal input
helm upgrade --install <release> \
oci://registry.tecnisys.com.br/tdp/charts/tdp-kafka \
-n <namespace> \
-f values.yaml \
-f values-connectors.yaml

Verification

Terminal input
# List all connectors in the namespace
kubectl -n <namespace> get kafkaconnectors

# Inspect the status of a specific connector
kubectl -n <namespace> describe kafkaconnector <connector-name>

# View Kafka Connect pod logs
kubectl -n <namespace> logs -l strimzi.io/cluster=connect-debezium-postgres

The status.connectorStatus.connector.state field should show RUNNING for a healthy connector.

Topic authorization with Ranger

Integration with Apache Ranger for authorization policies on Kafka topics is not within the scope configurable by this chart; treat it as an advanced environment configuration, performed directly on the Ranger plugin for Kafka.