Physical replication slot created through patroni configuration has retained the wal beyond default max_wal_size = 1 GB

We have created 3 node postgresql, patroni, etcd High Availability and Failover cluster. For this cluster, patroni configuration is as below -

namespace: ${NAMESPACE}
scope: ${SCOPE}
name: ${NODE_NAME}

restapi:
listen: 0.0.0.0:8008
connect_address: ${NODE_IP}:8008

etcd3:
host: ${NODE_IP}:2379

bootstrap:

this section will be written into Etcd:///config after initializing new cluster

dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
slots:
percona_cluster_1:
type: physical

  postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
          max_connections: 200
          wal_level: replica
          hot_standby: on
          wal_keep_segments: 10
          max_wal_senders: 5
          max_replication_slots: 10
          wal_log_hints: on
          logging_collector: 'on'

some desired options for ‘initdb’

initdb: # Note: It needs to be a list (some options need values, others are switches)

  • encoding: UTF8
  • data-checksums

pg_hba: # Add following lines to pg_hba.conf after running ‘initdb’

  • host replication replicator 127.0.0.1/32 trust
  • host replication replicator 0.0.0.0/0 md5
  • host all all 0.0.0.0/0 md5
  • host all all ::0/0 md5

Some additional users which needs to be created after initializing new cluster

users:
admin:
password: pwd123
options:

  • createrole
  • createdb
    percona:
    password: pwd123
    options:
  • createrole
  • createdb

postgresql:
cluster_name: cluster_1
listen: 0.0.0.0:5432
connect_address: ${NODE_IP}:5432
data_dir: ${DATA_DIR}
bin_dir: ${PG_BIN_DIR}
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: pwd987
superuser:
username: postgres
password: pwd789

parameters:
    unix_socket_directories: /var/run/postgresql/
create_replica_methods:
    - basebackup
basebackup:
    checkpoint: 'fast'

tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false

Cluster is running on this configuration. Physical replication slot named as “percona_cluster_1”, created in this patroni configuration has retained wal of size 3 TB and this slot is orphaned or inactive.

How to resolve this issue and prevent it from future occurance.

Hi Deepak,

Cluster is running on this configuration. Physical replication slot named as “percona_cluster_1”, created in this patroni configuration has retained wal of size 3 TB and this slot is orphaned or inactive.

How to resolve this issue and prevent it from future occurance.

The cause is this block in your configuration:

slots:
  percona_cluster_1:
    type: physical

That defines a permanent physical replication slot. If no replica is actively using it, PostgreSQL still preserves WAL for it, so pg_wal it can grow indefinitely. Patroni docs say permanent slots are preserved across failover and should be used cautiously.

Since you don’t want to use the slot anymore, you can remove it from Patroni DCS first:

patronictl -c /path/to/patroni.yml edit-config

Remove:

slots:
  percona_cluster_1:
    type: physical

Then drop the slot on the primary:

SELECT pg_drop_replication_slot('percona_cluster_1');

For advance, if you are using PostgreSQL version 13+, you can consider adding a WAL retention cap:

postgresql:
  parameters:
    max_slot_wal_keep_size: '100GB'

Pick a value that your disk can tolerate. If exceeded, the replica using that slot may need to be reinitialized, but the primary will not lose terabytes of disk space.

Let me know your thoughts on it.

Warm regards,

Nam.

Hi Nam,

Thanks for your reply.

Actually, I am already aware about this solution. My purpose of posting this issue here in percona community is to find out an approach that will be different from removing and dropping the non usable physical replication slot.

Thanks & Regards,

Deepak

Hi Deepak,

In the topic, you mentioned the following:

Physical replication slot created through patroni configuration has retained the wal beyond default max_wal_size = 1 GB

I believe there is a misunderstanding here, since this is not what max_wal_size does.

As Nam mentioned, that behavior is controlled by max_slot_wal_keep_size, which you most likely don’t have set.

Please consult the online documentation for more information about this variable: