Hello everyone,
I’ve being trying hard to understand what’s going on with my configuration, maybe I’m the problem.
For context I’ve a two node cluster, with patroni and postgresql in it, plus three node cluster of etcd. Everything works as expected, the failover, the switchover, etc.
My issue starts when I try to work with backups, in this case using the pgBackRest. I’m trying to restore my PostgreSQL to a full backup I have done in pgBackRest, without much of a success. The problem happens when I delete a table from a list of tables I’ve in my database, and when I do the full restore my table it’s not there. Per contrast, if I do a full backup, then delete a table and also create a new one, when I do the restore from the full backup (after deleting my data* folder of PostgreSQL), my table it’s still missing and the one I created it’s there.
I understood that might be something with the WAL or archive, where patroni tries to bring everything to the most recent state (the command I was using to do the restore it’s “sudo -iu postgres pgbackrest --stanza=cluster --log-level-console=info --type=none --archive-mode=off --set=20240126-184132F restore”), so then tried a different approach.
Instead of restoring everything from the full backup, I tried a PITR, so I did the same test of delete and creating a table, did a delta restore (sudo -iu postgres pgbackrest --stanza=cluster --delta --log-level-console=detail --type=time “–target=2024-01-26 21:30:22” --target-action=promote restore), which also didn’t work. My deleted table was yet to be found, and my new created one, was still there.
So I redid this last step, but instead of starting the postgresql using the patroni service, this time I started the postgresql directly (sudo -iu postgres /usr/pgsql-15/bin/pg_ctl --pgdata=/postgresql/data start) and to my surprise, then I see my old table and also no trace of my new created table, which was the expected behavior I was hoping to get. Then I finally stop the pg_ctl and restart the patroni service, and everything was there still.
To verify, redid the last delta again, but this time using the patroni service directly (instead of using pg_ctl), with no surprise my table was missing and my new table was there. Redid again everything, now with the pg_ctl, checked the missing table, which was there and the new one not (the expected), but when I started the patroni service got this message: “My wal position exceeds maximum replication lag”.
Right now my patroni is recognizing the postgresql only as a replica and will not leave this state no matter what I do.
If someone can shed light of what am I doing wrong here, or what’s wrong with my understanding of backups I would really appreciate that (in my defense, I come from MSSQL, please forgive me).
I leave here a part of the config I have for the patroni, a pretty generic one:
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
logging_collector: 'on'
max_wal_senders: 5
max_replication_slots: 5
wal_log_hints: "on"
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 10.10.1.1/32 md5
- host replication replicator 10.10.1.2/32 md5
- host replication replicator 127.0.0.1/32 trust
- local all all md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: strongpasshere
options:
- createrole
- createdb
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.10.1.1:5432
data_dir: "/postgresql/data"
bin_dir: "/usr/pgsql-15/bin"
pgpass: /tmp/pgpass0
authentication:
replication:
username: replicator
password: strongpasshere
superuser:
username: postgres
password: strongpasshere
parameters:
unix_socket_directories: "/var/run/postgresql"
And also a my stanza for pgBackRest:
global]
repo1-path=/postgresql/backup/
start-fast=y
log-level-console=info
log-path=/postgresql/backup/log
repo1-retention-archive=5
repo1-retention-archive-type=full
repo1-retention-full=2
repo1-retention-diff=6
[cluster]
pg1-path=/postgresql/data
pg1-port=5432
pg1-user=postgres
pg1-socket-path=/var/run/postgresql
Thanks in advance