Hello!
I followed this guide to setup Patroni: Deploying on RHEL or derivatives - Percona Distribution for PostgreSQL
And it works great, but I wanted to start using logical replication, so I found this post: How Patroni Addresses the Problem of the Logical Replication Slot Failover in a PostgreSQL Cluster
But I can’t seem to understand how is this supposed to work. In my patroni.yml I have this:
initdb:
- encoding: UTF8
- data-checksums
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
parameters:
archive_command: cp %p /opt/pgbackrest/%f #ignore this thing, attempts were made
archive_mode: 'on'
hot_standby: true
hot_standby_feedback: 'on'
logging_collector: 'on'
max_logical_replication_workers: 10
max_replication_slots: 10
max_wal_senders: 10
max_wal_size: 10GB
wal_keep_segments: 15
wal_level: logical
wal_log_hints: true
use_pg_rewind: true
use_slots: true
retry_timeout: 10
slots:
repslot:
database: thedb
plugin: pgoutput
type: logical
ttl: 30
According to the post, I need to create a new subscription (on the leader? not sure), but not sure which address should I specify either, currently I added (192.168.0.2 being the replica’s address):
\c bigdb
DROP PUBLICATION bigdbpub;
CREATE PUBLICATION bigdbpub FOR ALL TABLES;
CREATE SUBSCRIPTION bigdbsub CONNECTION 'host=192.168.0.2 port=5432 password=qwerty user=replicator dbname=bigdb' PUBLICATION bigdbpub WITH (copy_data = true, create_slot=false, enabled=true, slot_name=repslot);
After this I believe I did some switchover, but currently I have this as a result on my replica:
bigdb=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn
| confirmed_flush_lsn | wal_status | safe_wal_size | two_phase | conflicting
-----------------------------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+------------
-+---------------------+------------+---------------+-----------+-------------
repslot | pgoutput | logical | 16391 | bigdb | f | t | 547246 | | 740953175 | 42/376DEE68
| 42/386E7F00 | reserved | | f | f
pg_16480_sync_16394_7358104072011291410 | pgoutput | logical | 16391 | bigdb | f | t | 560886 | | 740953250 | 42/376F1240
| | reserved | | f | f
pg_16480_sync_16397_7358104072011291410 | pgoutput | logical | 16391 | bigdb | f | t | 560887 | | 740953250 | 42/376F1240
| | reserved | | f | f
(3 rows)
(what even are those 2 new replication slots?)
And I have this on my master:
bigdb=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirme
d_flush_lsn | wal_status | safe_wal_size | two_phase | conflicting
-------------------------+----------+-----------+--------+----------+-----------+--------+------------+-----------+--------------+-------------+---------
------------+------------+---------------+-----------+-------------
repslot | pgoutput | logical | 16391 | bigdb | f | f | | | 728472769 | 41/14EFC720 | 41/14FB4
B08 | reserved | | f | f
node2 | | physical | | | f | t | 559802 | 740819645 | 740819645 | 42/34F45DE8 |
| reserved | | f |
(2 rows)
(why is the physical replication slot still here?)
The replication does seem to work, but I don’t think it is working correctly since I get these messages from time to time:
2024-04-18 17:54:19.780 EEST [553403] LOG: background worker "logical replication worker" (PID 571107) exited with exit code 1
2024-04-18 17:54:19.780 EEST [571108] ERROR: duplicate key value violates unique constraint "bigdb_serial_nr_key"
2024-04-18 17:54:19.780 EEST [571108] DETAIL: Key (serial_nr)=(123456789) already exists.
2024-04-18 17:54:19.780 EEST [571108] CONTEXT: COPY bigdb, line 1
My question is: how should logical replication be set up?
The biggest confusion is when I specify “192.168.0.2” in the subscription. Wouldn’t the subscription fail when the “leader” has changed? Should the “physical” replication slot be there (I believe it is creating the duplicate key error, does not happen with physical replication only)?