Change Data Capture (CDC) turns your database's transaction log into a Kafka topic. Every INSERT, UPDATE, DELETE becomes an event downstream consumers can react to in near-real-time — without polling the database. Debezium is the open-source reference implementation.

Advertisement

How it taps the log

Debezium reads the database's binary log directly: MySQL binlog, Postgres WAL via logical replication, MongoDB oplog. No triggers, no polling, no application changes. The source DB sees a single replication client.

Event shape

{
  "op": "u",         // c=create, u=update, d=delete, r=read
  "ts_ms": 1719388800000,
  "source": { "db": "orders_db", "table": "orders", "txId": 42 },
  "before": { "id": 7, "status": "pending" },
  "after":  { "id": 7, "status": "shipped" }
}
Advertisement

Schema evolution

Debezium integrates with Confluent Schema Registry by default. ALTER TABLE results in a new schema version; downstream consumers using Avro deserializers handle backward-compatible changes automatically. Breaking changes (column rename, type change) require coordination.

Production gotchas

Initial snapshot of a large table can take hours and blocks WAL recycling — schedule for low-traffic windows. Long-running consumer lag pins the WAL — alert on it. For Postgres, set max_wal_senders and use a dedicated replication slot.

Use cases

Search index sync (Postgres → Elasticsearch). Cache invalidation (DB → Redis). Audit log. Multi-region replication. Event-driven microservices (each consumer reacts to DB changes). CDC is the lowest-coupling way to add async pipelines without app code changes.

Debezium turns the DB log into a Kafka topic. Mind the snapshot window and the WAL pinning — both bite in production.