Pgbackrest restore error in Job + default restores not working

Arnout_Hoebreckx · July 12, 2023, 10:17pm

Description:

Hi,

We’ve been getting the following error in our restore logs

pgbackrest restore --type=default --stanza=db --pg1-path=/pgdata/pg14 --repo=1 --delta --link-map=pg_wal=/pgdata/pg14_wal

WARN: unknown group ‘postgres’ in backup manifest mapped to current group

2023-07-12 19:38:47.850 GMT [18] LOG: starting PostgreSQL 14.7 - Percona Distribution on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit

2023-07-12 19:38:47.850 GMT [18] LOG: listening on IPv6 address “::1”, port 5432

2023-07-12 19:38:47.850 GMT [18] LOG: listening on IPv4 address “127.0.0.1”, port 5432

2023-07-12 19:38:47.864 GMT [18] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”

2023-07-12 19:38:47.883 GMT [19] LOG: database system was interrupted; last known up at 2023-07-12 19:33:53 GMT

2023-07-12 19:38:47.928 GMT [19] LOG: starting archive recovery

2023-07-12 19:38:48.033 GMT [19] LOG: restored log file “000000010000000000000004” from archive

2023-07-12 19:38:48.104 GMT [19] LOG: redo starts at 0/4000028

2023-07-12 19:38:48.111 GMT [19] LOG: consistent recovery state reached at 0/4000170

2023-07-12 19:38:48.112 GMT [18] LOG: database system is ready to accept read-only connections

2023-07-12 19:38:48.237 GMT [19] LOG: restored log file “000000010000000000000005” from archive

2023-07-12 19:38:49.145 GMT [19] LOG: restored log file “000000010000000000000006” from archive

2023-07-12 19:38:49.294 GMT [19] LOG: restored log file “000000010000000000000007” from archive

2023-07-12 19:38:49.382 GMT [19] LOG: redo done at 0/7002FA8 system usage: CPU: user: 0.02 s, system: 0.02 s, elapsed: 1.27 s

2023-07-12 19:38:49.382 GMT [19] LOG: last completed transaction was at log time 2023-07-12 19:35:56.213231+00

2023-07-12 19:38:49.480 GMT [19] LOG: restored log file “000000010000000000000007” from archive

2023-07-12 19:38:49.729 GMT [19] LOG: selected new timeline ID: 2

2023-07-12 19:38:49.820 GMT [19] LOG: archive recovery complete

2023-07-12 19:38:50.048 GMT [18] LOG: database system is ready to accept connections

2023-07-12 19:38:50.050 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:50.050 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:51.053 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:51.053 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:51.153 GMT [18] LOG: received fast shutdown request

2023-07-12 19:38:51.162 GMT [18] LOG: aborting any active transactions

2023-07-12 19:38:51.164 GMT [18] LOG: background worker “logical replication launcher” (PID 43) exited with exit code 1

2023-07-12 19:38:51.164 GMT [22] LOG: shutting down

2023-07-12 19:38:51.517 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:51.517 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:51.517 GMT [42] WARNING: archiving write-ahead log file “00000002.history” failed too many times, will try again later

2023-07-12 19:38:51.519 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:51.519 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:52.522 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:52.522 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:53.525 GMT [42] LOG: archive command failed with exit code 1

2023-07-12 19:38:53.525 GMT [42] DETAIL: The failed archive command was: false

2023-07-12 19:38:53.525 GMT [42] WARNING: archiving write-ahead log file “00000002.history” failed too many times, will try again later

2023-07-12 19:38:53.528 GMT [18] LOG: database system is shut down

It’s seemingly failing in a fatal error. However when using --type=time and selecting a backup it’s properly restoring everything.

When using the type=default it’s not resetting back to the changes I made.

Steps to Reproduce:

Running version 2.2 of the Operator on an Openshift cluster 4.10, just running restore job does not matter which options but usually --type=default

Version:

2.2.0

Logs:

[If applicable, include any relevant log files or error messages]

Expected Result:

Default type restores should restore the latest full backup but nothing seems to be reverting back. With specific targeted with --type=time it does work.

Actual Result:

[What actually happened when the user encountered the issue]

Additional Information:

[Include any additional information that could be helpful to diagnose the issue, such as browser or device information]

Slavisa_Milojkovic · July 14, 2023, 7:27am

Hi, I think it should be --type=full not --type=default

Topic		Replies	Views
Unable to open log file '/pgdata/pgbackrest/log/db-restore.log' Percona Operator for PostgreSQL	1	650	December 7, 2023
Percona postgresql \| Backup and Restore \| Percona Operator for PostgreSQL percona , postgres , postgresql	7	1071	July 6, 2023
Has anyone been able to get S3 backups working for Postgres Operator 1.3? Percona Operator for PostgreSQL	15	3035	April 5, 2023
Pgbackrest problem with operator 1.4.1 Percona Operator for PostgreSQL	3	1110	May 30, 2023
Backup fails after helm upgrade Percona Operator for PostgreSQL	1	194	November 16, 2024