How to enable Change Data Capture (CDC) in Amazon AWS (Aurora) PostgreSQL Relational Data Service (RDS) using logical replication of the PostgreSQL write-ahead log (WAL)

Macharia Muguku
6 min readMar 20, 2025

--

AWS Aurora Postgres Logical Replication

Change data capture (CDC) refers to the process of identifying and capturing changes made to data in a database and then delivering those changes in real-time or near-real-time to a downstream process or system enabling seamless data synchronization between systems.

It captures changes (inserts, updates, and deletes) as they happen and delivers them to downstream systems.

Why it’s important

Keeps systems in sync: CDC ensures that data is consistent across multiple systems and environments. This is especially important for tracking changes (updates, deletes) in already replicated data in data warehouses or data lakes. E.g when you need a record of deletions in ETL/ELT processes.

Real-time data integration: CDC enables real-time data integration between systems. This makes it well suited for moving data into a stream processing solution like Apache Kafka.

Change Data Capture Methods

Log-based CDC: This method uses the database’s transaction log to capture changes. The transaction log contains a record of all changes made to the database. A CDC tool reads the transaction log and captures the changes. There are two types of log-based CDC:

  • Logical: Replicating data changes (inserts, updates, deletes) at a transaction level based upon their replication identity (usually a primary key). Logical replication uses a publish and subscribe model with one or more subscribers subscribing to one or more publications on a publisher node. This method is efficient and flexible because it only replicates changes that are needed.
  • Physical: Copying the entire database byte-by-byte (by using exact block addresses), ensuring data consistency at a very granular level but with less flexibility.
Log Based CDC

Query-based CDC: This method uses database queries to capture changes. The CDC tool runs queries against the database to identify changes. This method is less efficient than log-based CDC because it requires more resources to run queries against the database. You also need something like a timestamp (cursor field) in the data itself to be able to use this method.

Trigger-based CDC: This method uses triggers to capture changes in the database. When a change is made to a table, the trigger fires and captures the change. The change is then written to a change table, which is read by a downstream process.

Enabling CDC in Amazon (Aurora) PostgreSQL RDS

Prerequisites

  • An AWS account with the necessary permissions to create and modify resources
  • An Amazon RDS (Aurora) PostgreSQL instance
  • A postgreSQL user with the necessary permissions to assign replication permissions to other users as well as create publications and subscriptions slots. We’ll assume the RDS master user named admin with superuser access to the database. See the disclaimer in step one below.
  • A PostgreSQL user to use for replication. We’ll assume a user named cdc_user with initially only read-only access to the database.

Step 1: Provide additional permissions to read-only user

Using the admin user, grant REPLICATION permissions to the cdc_user user:

GRANT rds_replication TO cdc_user;

N/B:

  • SUPERUSER and REPLICATION attributes are not available to rds_superuser, which is the group your master user belongs to. There is a separate role called rdsadmin who has all the permissions but is not accessible to RDS users.
  • Replication on AWS PostgreSQL is only possible with the CREATE PUBLICATION and CREATE SUBSCRIPTION commands, available from Pgsql v10 (v10.6 in Aurora). You cannot use the old way with ALTER USER some_user REPLICATION.

Step 2: Enable logical replication on your AWS Postgres RDS or Aurora

  • On the AWS console, navigate to the Aurora and RDS dashboard, and select your Aurora PostgreSQL instance from the DB Instances.
  • On your DB cluster’s Configuration tab, find your DB cluster parameter group. Open it and note it's Resource Type, and Parameter Group Family.
Example Parameter Group Details
  • N/B: In AWS, you cannot modify a default parameter group and hence the need to create a new one (copy of default)
  • Navigate to the Parameter Groups section on the left sidebar, select the Custom tab, and click on Create Parameter Group.
Navigate To Parameter Groups
Create Parameter Group Button
  • Enter a parameter group name and description, select Aurora PostgreSQL Engine type for Aurora or PostgreSQL depending on your RDS type and select the Parameter Group Family and Type that matches the ones you noted earlier. Click Create.
Create Parameter Group
  • Select the newly created parameter group, click on edit, on the Modifiable parameters tab, search for rds.logical_replication and set it to 1 then click on Save changes.
Enable Logical Replication
  • Navigate back to the DB Instances section, select your Aurora PostgreSQL instance, and click on Modify.
Modify DB
  • In the Additional configuration section, select the newly created parameter group from the Parameter group dropdown. Click Continue and Apply immediately to apply the changes immediately without needing to wait for a maintenance window. Alternatively you can navigate back to the DB list, select your db, click on actions and then on Patch Now or restart it manually.
Apply New Parameter Group
  • Verify the values for wal_level and rds.logical_replication:
SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');

name | setting
-------------------------+---------
rds.logical_replication | on
wal_level | logical
(2 rows)

Step 3: Create a replication slot on your Postgres database

  • Create a replication slot called example_slot using the pgoutput plugin
SELECT pg_create_logical_replication_slot('example_slot', 'pgoutput');

Step 4: Create publication and replication identities for each Postgres table

For each table you want to replicate with CDC, follow the steps below:

  1. Add the replication identity (the method of distinguishing between rows) for each table you want to replicate:
ALTER TABLE tbl1 REPLICA IDENTITY DEFAULT;

N/B:

  • In rare cases, if your tables use data types that support TOAST or have very large field values, consider instead using replica identity type full:
ALTER TABLE tbl1 REPLICA IDENTITY FULL;

2. Create the Postgres publication. You should include all tables you want to replicate as part of the publication:

CREATE PUBLICATION example_publication FOR TABLE <tbl1, tbl2, tbl3>;`

N/B:

  • To replicate all the tables in a database, run this command:
  CREATE PUBLICATION example_publication FOR ALL TABLES;
  • The publication name is customizable. Refer to the Postgres docs if you need to add or remove tables from your publication in the future.

3. Verify that the source tables are added to the publication:

SELECT * FROM pg_publication_tables;

pubname | schemaname | tablename
--------------------+------------+------------
example_publication | public | tbl1
example_publication | public | tbl2
example_publication | public | tbl3
(2 rows)

4. Follow this tutorial to create subscriptions from a target (secondary) database

FIN!

--

--

Macharia Muguku
Macharia Muguku

Written by Macharia Muguku

A student of the world ; My brain has too many tabs open ; 🇰🇪; www.muguku.co.ke

No responses yet