Patroni and pgBackRest - full restore problem

Hello everyone,

I’ve being trying hard to understand what’s going on with my configuration, maybe I’m the problem.

For context I’ve a two node cluster, with patroni and postgresql in it, plus three node cluster of etcd. Everything works as expected, the failover, the switchover, etc.

My issue starts when I try to work with backups, in this case using the pgBackRest. I’m trying to restore my PostgreSQL to a full backup I have done in pgBackRest, without much of a success. The problem happens when I delete a table from a list of tables I’ve in my database, and when I do the full restore my table it’s not there. Per contrast, if I do a full backup, then delete a table and also create a new one, when I do the restore from the full backup (after deleting my data* folder of PostgreSQL), my table it’s still missing and the one I created it’s there.

I understood that might be something with the WAL or archive, where patroni tries to bring everything to the most recent state (the command I was using to do the restore it’s “sudo -iu postgres pgbackrest --stanza=cluster --log-level-console=info --type=none --archive-mode=off --set=20240126-184132F restore”), so then tried a different approach.

Instead of restoring everything from the full backup, I tried a PITR, so I did the same test of delete and creating a table, did a delta restore (sudo -iu postgres pgbackrest --stanza=cluster --delta --log-level-console=detail --type=time “–target=2024-01-26 21:30:22” --target-action=promote restore), which also didn’t work. My deleted table was yet to be found, and my new created one, was still there.

So I redid this last step, but instead of starting the postgresql using the patroni service, this time I started the postgresql directly (sudo -iu postgres /usr/pgsql-15/bin/pg_ctl --pgdata=/postgresql/data start) and to my surprise, then I see my old table and also no trace of my new created table, which was the expected behavior I was hoping to get. Then I finally stop the pg_ctl and restart the patroni service, and everything was there still.

To verify, redid the last delta again, but this time using the patroni service directly (instead of using pg_ctl), with no surprise my table was missing and my new table was there. Redid again everything, now with the pg_ctl, checked the missing table, which was there and the new one not (the expected), but when I started the patroni service got this message: “My wal position exceeds maximum replication lag”.

Right now my patroni is recognizing the postgresql only as a replica and will not leave this state no matter what I do.

If someone can shed light of what am I doing wrong here, or what’s wrong with my understanding of backups I would really appreciate that (in my defense, I come from MSSQL, please forgive me).

I leave here a part of the config I have for the patroni, a pretty generic one:

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        logging_collector: 'on'
        max_wal_senders: 5
        max_replication_slots: 5
        wal_log_hints: "on"
  initdb:
    - encoding: UTF8
    - data-checksums
  pg_hba:
    - host replication replicator 10.10.1.1/32 md5
    - host replication replicator 10.10.1.2/32 md5
    - host replication replicator 127.0.0.1/32 trust
    - local all all md5
    - host all all 0.0.0.0/0 md5
  users:
    admin:
      password: strongpasshere
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.10.1.1:5432
  data_dir: "/postgresql/data"
  bin_dir: "/usr/pgsql-15/bin"
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: strongpasshere
    superuser:
      username: postgres
      password: strongpasshere
  parameters:
    unix_socket_directories: "/var/run/postgresql"

And also a my stanza for pgBackRest:

global]
repo1-path=/postgresql/backup/
start-fast=y
log-level-console=info
log-path=/postgresql/backup/log
repo1-retention-archive=5
repo1-retention-archive-type=full
repo1-retention-full=2
repo1-retention-diff=6

[cluster]
pg1-path=/postgresql/data
pg1-port=5432
pg1-user=postgres
pg1-socket-path=/var/run/postgresql

Thanks in advance

Hello Pirolas,

Hope you are doing well! I have reviewed your query.

The process invocation in Patroni doesn’t work the same as normal pg_ctl we use. Patroni sets parameters specifically for its use. If we want manual configuration in-place, we need to instruct Patroni for the same.

In this case, when you restore your database from Pgbackrest and start it using Patroni, Patroni omits recover-related parameters, such as recover_target_time, recover_target_action and others.

You use the below configuration and try to perform restore by starting Patroni.

bootstrap:
  method: pgbackrest
    pgbackrest:
      command: pgbackrest --stanza=cluster --log-level-console=info --type=none --archive-mode=off --set=20240126-184132F restore
      keep_existing_recovery_conf: true
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        logging_collector: 'on'
        max_wal_senders: 5
        max_replication_slots: 5
        wal_log_hints: "on"
  initdb:
    - encoding: UTF8
    - data-checksums
  pg_hba:
    - host replication replicator 10.10.1.1/32 md5
    - host replication replicator 10.10.1.2/32 md5
    - host replication replicator 127.0.0.1/32 trust
    - local all all md5
    - host all all 0.0.0.0/0 md5
  users:
    admin:
      password: strongpasshere
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.10.1.1:5432
  data_dir: "/postgresql/data"
  bin_dir: "/usr/pgsql-15/bin"
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: strongpasshere
    superuser:
      username: postgres
      password: strongpasshere
  parameters:
    unix_socket_directories: "/var/run/postgresql"

Please feel free to revert to us if you find any issue.

Hello Pirolas,

Adding my two cents: As per the previous response, Patroni has it’s own ways of working with pg_backrest, once it’s been correctly configured. However, if I understand your question I think you need to understand how to administer pg_backrest itself. In that case I’d like to suggest you perform a little googling using the following string in the search bar:

site: percona.com blog pgbackrest

Hope this helps.

Hello,

Just getting back on this and to leave it here for anyone else facing the same issue:

Make sure that no one is working on the same machine at the same time as you. You might also want to check the configuration at the etcd instead of the file, because once the cluster it’s started, it’s there were you need to look.

Ultimately the problem lied at the configuration of patroni, which had a config to always restore to the last possible PITR.

This configuration could be seen with the ‘patronictl edit’, but stupid me, I was looking at the file, because I really thought I was the only one looking at the issue - but clearly that was not the case.

It all ended up making sense after all.

Sorry for the post, thank you for both replies!