Pgbackrest restore error in Job + default restores not working

Description:

Hi,

We’ve been getting the following error in our restore logs

  • pgbackrest restore --type=default --stanza=db --pg1-path=/pgdata/pg14 --repo=1 --delta --link-map=pg_wal=/pgdata/pg14_wal

WARN: unknown group ‘postgres’ in backup manifest mapped to current group

2023-07-12 19:38:47.850 GMT [18] LOG: starting PostgreSQL 14.7 - Percona Distribution on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit

2023-07-12 19:38:47.850 GMT [18] LOG: listening on IPv6 address “::1”, port 5432

2023-07-12 19:38:47.850 GMT [18] LOG: listening on IPv4 address “127.0.0.1”, port 5432

2023-07-12 19:38:47.864 GMT [18] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”

2023-07-12 19:38:47.883 GMT [19] LOG: database system was interrupted; last known up at 2023-07-12 19:33:53 GMT

2023-07-12 19:38:47.928 GMT [19] LOG: starting archive recovery

2023-07-12 19:38:48.033 GMT [19] LOG: restored log file “000000010000000000000004” from archive

2023-07-12 19:38:48.104 GMT [19] LOG: redo starts at 0/4000028

2023-07-12 19:38:48.111 GMT [19] LOG: consistent recovery state reached at 0/4000170

2023-07-12 19:38:48.112 GMT [18] LOG: database system is ready to accept read-only connections

2023-07-12 19:38:48.237 GMT [19] LOG: restored log file “000000010000000000000005” from archive

2023-07-12 19:38:49.145 GMT [19] LOG: restored log file “000000010000000000000006” from archive

2023-07-12 19:38:49.294 GMT [19] LOG: restored log file “000000010000000000000007” from archive

2023-07-12 19:38:49.382 GMT [19] LOG: redo done at 0/7002FA8 system usage: CPU: user: 0.02 s, system: 0.02 s, elapsed: 1.27 s

2023-07-12 19:38:49.382 GMT [19] LOG: last completed transaction was at log time 2023-07-12 19:35:56.213231+00

2023-07-12 19:38:49.480 GMT [19] LOG: restored log file “000000010000000000000007” from archive

2023-07-12 19:38:49.729 GMT [19] LOG: selected new timeline ID: 2

2023-07-12 19:38:49.820 GMT [19] LOG: archive recovery complete

2023-07-12 19:38:50.048 GMT [18] LOG: database system is ready to accept connections

2023-07-12 19:38:50.050 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:50.050 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:51.053 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:51.053 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:51.153 GMT [18] LOG: received fast shutdown request

2023-07-12 19:38:51.162 GMT [18] LOG: aborting any active transactions

2023-07-12 19:38:51.164 GMT [18] LOG: background worker “logical replication launcher” (PID 43) exited with exit code 1

2023-07-12 19:38:51.164 GMT [22] LOG: shutting down

2023-07-12 19:38:51.517 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:51.517 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:51.517 GMT [42] WARNING: archiving write-ahead log file “00000002.history” failed too many times, will try again later

2023-07-12 19:38:51.519 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:51.519 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:52.522 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:52.522 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:53.525 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:53.525 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:53.525 GMT [42] WARNING: archiving write-ahead log file “00000002.history” failed too many times, will try again later

2023-07-12 19:38:53.528 GMT [18] LOG: database system is shut down

It’s seemingly failing in a fatal error. However when using --type=time and selecting a backup it’s properly restoring everything.

When using the type=default it’s not resetting back to the changes I made.

Steps to Reproduce:

Running version 2.2 of the Operator on an Openshift cluster 4.10, just running restore job does not matter which options but usually --type=default

Version:

2.2.0

Logs:

[If applicable, include any relevant log files or error messages]

Expected Result:

Default type restores should restore the latest full backup but nothing seems to be reverting back. With specific targeted with --type=time it does work.

Actual Result:

[What actually happened when the user encountered the issue]

Additional Information:

[Include any additional information that could be helpful to diagnose the issue, such as browser or device information]

Hi, I think it should be --type=full not --type=default