PostgresSQL failed to start with no "archive.info" file

VNuser · January 3, 2024, 10:49am

Hi,

We accidentally missed files from backrestrepo! We had cluster issues, those stopped database hard way, during startup database tries to restore

LOG

FileMissingError: raised from remote-0 ssh protocol on 'zabbix-ha-db-pg-db-backrest-shared-repo': unable to open missing file '/backrestrepo/zabbix-ha-db-pg-db-backrest-shared-repo/archive/db/archive.info' for read
      FileMissingError: raised from remote-0 ssh protocol on 'zabbix-ha-db-pg-db-backrest-shared-repo': unable to open missing file '/backrestrepo/zabbix-ha-db-pg-db-backrest-shared-repo/archive/db/archive.info.copy' for read
      HINT: archive.info cannot be opened but is required to push/get WAL segments.
      HINT: is archive_command configured correctly in postgresql.conf?
      HINT: has a stanza-create been performed?
      HINT: use --no-archive-check to disable archive checks during backup if you have an alternate archiving scheme.
ERROR: [103]: unable to find a valid repository
2024-01-03 10:45:32.222 UTC [48874] FATAL:  the database system is starting up
2024-01-03 10:45:33.243 UTC [48879] FATAL:  the database system is starting up
2024-01-03 10:45:33.472 UTC [48881] FATAL:  the database system is starting up

Please advice, can we somehow start it ignoring archive, backup steps , to get dump and do migration, clean setup with existing database content?

Thanks in advance

Robert_Bernier · January 3, 2024, 6:39pm

Hi,

I’ll be honest, this has me concerned …

ERROR: [103]: unable to find a valid repository

First task: locate and physically review your WALS
Next: review your postgres logs and identify specifically which WAL it stalls.
Finally: if you are indeed missing WALS then, under the right conditions it still possible to recover your STANDBY using pg_rewind.

With a little luck I’ve fully understood your situation. By all means fill in any details that I may not be totally clear about.

Hope this helps.

regards

Robert

VNuser · January 4, 2024, 8:11am

Hello,

Thanks for quick replay!

We missed replicas, and leader still present, but is displayed as replicas!

bash-4.4$ patronictl list
+ Cluster: zabbix-ha-db-pg-db (7251563696400589006) +---------+--------------+----+-----------+-----------------+
| Member                              | Host        | Role    | State        | TL | Lag in MB | Pending restart |
+-------------------------------------+-------------+---------+--------------+----+-----------+-----------------+
| zabbix-ha-db-pg-db-66d6b974df-r9tb7 | 10.2.217.14 | Replica | start failed |    |   unknown | *               |
+-------------------------------------+-------------+---------+--------------+----+-----------+-----------------+

rewind gave us following result, since database remains in starting phase

pg_rewind -D /var/lib/postgresql/data/ --source-server="port=5432 user=postgres dbname=zabbix"
pg_rewind: fatal: connection to server on socket "/run/postgresql/.s.PGSQL.5432" failed: FATAL:  the database system is starting up

Following log was generated during startup on maintenance pod with the same data dir

2024-01-04 07:50:44.934 UTC [680] LOG:  starting PostgreSQL 14.10 on x86_64-alpine-linux-musl, compiled by gcc (Alpine 13.2.1_git20231014) 13.2.1 20231014, 64-bit
2024-01-04 07:50:44.934 UTC [680] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-01-04 07:50:44.936 UTC [680] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-01-04 07:50:44.941 UTC [682] LOG:  database system was shut down in recovery at 2024-01-04 07:44:18 UTC
2024-01-04 07:50:44.941 UTC [682] WARNING:  specified neither primary_conninfo nor restore_command
2024-01-04 07:50:44.941 UTC [682] HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-01-04 07:50:44.941 UTC [682] LOG:  entering standby mode
2024-01-04 08:00:21.468 UTC [707] FATAL:  the database system is starting up

This log is from failed percona cluster leader node

Only difference is there is also archive and recovery commands in postgresql.conf includet

2024-01-04 08:07:03.542 UTC [1015918] FATAL:  the database system is starting up
WARN: repo1: [FileMissingError] unable to load info file '/backrestrepo/zabbix-ha-db-pg-db-backrest-shared-repo/archive/db/archive.info' or '/backrestrepo/zabbix-ha-db-pg-db-backrest-shared-repo/archive/db/archive.info.copy':
      FileMissingError: raised from remote-0 ssh protocol on 'zabbix-ha-db-pg-db-backrest-shared-repo': unable to open missing file '/backrestrepo/zabbix-ha-db-pg-db-backrest-shared-repo/archive/db/archive.info' for read
      FileMissingError: raised from remote-0 ssh protocol on 'zabbix-ha-db-pg-db-backrest-shared-repo': unable to open missing file '/backrestrepo/zabbix-ha-db-pg-db-backrest-shared-repo/archive/db/archive.info.copy' for read
      HINT: archive.info cannot be opened but is required to push/get WAL segments.
      HINT: is archive_command configured correctly in postgresql.conf?
      HINT: has a stanza-create been performed?
      HINT: use --no-archive-check to disable archive checks during backup if you have an alternate archiving scheme.
ERROR: [103]: unable to find a valid repository
2024-01-04 08:07:03.834 UTC [1015923] FATAL:  the database system is starting up

Robert_Bernier · January 4, 2024, 4:49pm

Hi again,

This has caught my attention.

VNuser:

2024-01-04 07:50:44.941 UTC [682] WARNING:  specified neither primary_conninfo nor restore_command
2024-01-04 07:50:44.941 UTC [682] HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.

Take a look at your configuration setup and look for these parameters, the above messages suggest there’s something at issue:

primary_conninfo
restore_command

TIP: Because your root problem lies with postgres you should simplify you environment. For example put patroni into maintenance mode and debug postgres manually.

Hope this helps.

VNuser · January 5, 2024, 9:22am

Hi Robert.

Thanks for support,

We did similar, but using separate maintenance pod with the existing PostgreSQL data folder, for indeed much easier debugging and managed to start database and create dump!

Thanks a lot
Regards,
Vitaly

Topic		Replies	Views
Erro permission pgbackrest PostgreSQL postgres	2	580	June 26, 2024
Unable to open log file '/pgdata/pgbackrest/log/db-restore.log' Percona Operator for PostgreSQL	1	634	December 7, 2023
Postgres13.2 disk was full postgres: archiver failed PostgreSQL	9	4430	November 2, 2022
Recreate a pgbackrest stanza for an existing cluster PGOv1.2 Percona Operator for PostgreSQL	2	1216	October 24, 2023
Pgbackrest restore error in Job + default restores not working Percona Operator for PostgreSQL percona , postgres , postgresql	1	1255	July 14, 2023

PostgresSQL failed to start with no "archive.info" file

Related topics