Logical replication on large scale - 133 Databases - 140 Large tables - 2 replication slots per database

felipesodre · February 4, 2024, 7:17pm

Hi there, and thank you in advance for your help.

I need to build a pipeline for Data Lakes/Data Warehouses using PostgresSQL 14.8 as a source and S3 as a target, with AWS Managed Service DMS capturing the data from logical replication with test_decoding.

First question: Is 266 replication slots too much and I am being optimistic to believe that PostgreSQL 14.8 can handles it? Any foreseen challenges and general guidance?

For instance, so far the same solution is working fine on servers with low number of databases but ingests much more data daily. For context we replicate 140 tables per database divided in 2 replication slots per database with roughly 70 tables per replication slot.

Yesterday particularly, for the mentioned server with 133 databases, we noticed that when we ran backups (parallel pg_dump on all 133 databases flag “–jobs 8”), the CPU used by the logical replication user skyrocketed, as could be seen in attached screenshot; database wait events: “IO:ReorderBufferWrite”, “IO:WALRead”.

Any guidance as to why pg_dump would cause this issue, and specifically on “IO:ReorderBufferWrite”? We noticed that even after the pg_dump completed at 6:00am, the replication user was still behind on “IO:ReorderBufferWrite”.

Thank you so much for your time and consideration. Appreciate it!

Link for the screenshot: Logical-Replication-High-Usage hosted at ImgBB — ImgBB

Robert_Bernier · February 6, 2024, 3:09pm

Hi,

Is 266 replication slots too much and I am being optimistic to believe that PostgreSQL 14.8 can handles it?

I’ll be honest, I’ve never tested the limits of maximum replication slots possible. Offhand there’s two limitations you need to consider:

hardcoded limitations: in that case get onto the hackers maillist and ask there
network IO: your network admin can help you determine bottlenecks

The simple way of course is to build out a demo environment and try it out. If you choose that option I’d suggest looking into leveraging pgbench using a custom script.

… we ran backups (parallel pg_dump on all 133 databases flag “–jobs 8”), the CPU used by the logical replication user skyrocketed,…

Sounds like the Long Running Transaction bugaboo. You can also have a look at this blog too for further guidance/

Hope this helps.

Topic		Replies	Views
Can we do streaming and logical replication in postgresql Slave at same time? PostgreSQL	1	2785	January 23, 2023
Can we do streaming and logical replication in postgresql at same time? PostgreSQL postgresql	2	1016	June 13, 2022
Anyone using parallel replication with only one logical db for a large mission critical use case? Percona Server for MySQL 8.0	2	2475	January 14, 2021
Logical replication on postgres PostgreSQL	1	691	January 9, 2021
Will PostgreSQL logical replication (pglogical) need a quick lock to get the "started LSN" before fulldump like mysqldump + binlog replication~? PostgreSQL	1	480	March 27, 2024

Logical replication on large scale - 133 Databases - 140 Large tables - 2 replication slots per database

Related topics