Hi there, and thank you in advance for your help.
I need to build a pipeline for Data Lakes/Data Warehouses using PostgresSQL 14.8 as a source and S3 as a target, with AWS Managed Service DMS capturing the data from logical replication with test_decoding.
First question: Is 266 replication slots too much and I am being optimistic to believe that PostgreSQL 14.8 can handles it? Any foreseen challenges and general guidance?
For instance, so far the same solution is working fine on servers with low number of databases but ingests much more data daily. For context we replicate 140 tables per database divided in 2 replication slots per database with roughly 70 tables per replication slot.
Yesterday particularly, for the mentioned server with 133 databases, we noticed that when we ran backups (parallel pg_dump on all 133 databases flag “–jobs 8”), the CPU used by the logical replication user skyrocketed, as could be seen in attached screenshot; database wait events: “IO:ReorderBufferWrite”, “IO:WALRead”.
Any guidance as to why pg_dump would cause this issue, and specifically on “IO:ReorderBufferWrite”? We noticed that even after the pg_dump completed at 6:00am, the replication user was still behind on “IO:ReorderBufferWrite”.
Thank you so much for your time and consideration. Appreciate it!
Link for the screenshot: Logical-Replication-High-Usage hosted at ImgBB — ImgBB