Hi All,
Any idea, why I am getting the below error after restarting my replica PostgresDB server?
2022-11-29 08:24:37.129 UTC [8196] LOG: starting PostgreSQL 14.5 (Ubuntu 14.5-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit
2022-11-29 08:24:37.129 UTC [8196] LOG: listening on IPv4 address “0.0.0.0”, port 5432
2022-11-29 08:24:37.129 UTC [8196] LOG: listening on IPv6 address “::”, port 5432
2022-11-29 08:24:37.130 UTC [8196] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
2022-11-29 08:24:37.143 UTC [8198] LOG: database system was shut down in recovery at 2022-11-29 08:24:34 UTC
2022-11-29 08:24:37.143 UTC [8198] LOG: entering standby mode
2022-11-29 08:24:37.152 UTC [8198] LOG: redo starts at 40/460000A0
2022-11-29 08:24:37.194 UTC [8198] LOG: consistent recovery state reached at 40/469AEA28
2022-11-29 08:24:37.194 UTC [8198] LOG: invalid record length at 40/469AEA28: wanted 24, got 0
2022-11-29 08:24:37.195 UTC [8196] LOG: database system is ready to accept read-only connections
2022-11-29 08:24:37.222 UTC [8202] LOG: started streaming WAL from primary at 40/46000000 on timeline 1
2022-11-29 08:28:22.427 UTC [8196] LOG: received fast shutdown request
2022-11-29 08:28:22.581 UTC [8196] LOG: aborting any active transactions
2022-11-29 08:28:22.581 UTC [8202] FATAL: terminating walreceiver process due to administrator command
2022-11-29 08:28:22.585 UTC [8199] LOG: shutting down
2022-11-29 08:28:23.503 UTC [8196] LOG: database system is shut down
1 Like
Can some one help me on this issue?
1 Like
Hi,
I suspect your REPLICA is out of sync because it read a WAL segment that was on a different timeline then your PRIMARY.
There’s a couple of ways of checking:
-
For both PRIMARY and REPLICA, under directory pg_wal look for the most recent WAL segment and examine the timelines.If it is out of sync the time line on the PRIMARY will be different than the REPLICA wal_segment.
-
This is old school: Execute the following command on both hosts and compare the outputs. If the REPLICA is out of sync it shows up in the various pieces of information that is output.
EXAMPLE: use pg_controldata
/usr/lib/postgresql/14/bin/pg_controldata -D /opt/pg14/data | less
REMEDIATION (assuming it’s out of sync): Either perform a new basebackup … OR … use pg_rewind to resync REPLICA to PRIMARY
Refer to the postgres documentation for more information describing how to use pg_rewind.
1 Like
There’s also other possible issues including:
- The REPLICA is misconfigured and is reading the wrong timeline
- You may be missing WAL segments.
In any case the REPLICA is now out-of-sync. But how it got that way is not understood due to insufficient information.
1 Like
Hi @Robert_Bernier
Thanks for the quick response.
But the replica is running fine. This issue is occurring only after restarting the replica DB server.
After restarting, the replica server is running fine and replication also fine from Master DB.
1 Like
Good to know. So long as the REPLICA is picking up from the PRIMARY then you shouldn’t need to worry.
2 Likes
So, we can ignore the above error if slave is running fine without any replication break/issue from the Master.
1 Like
Apart from suggesting a cluster-wide vacuum, or repack using pg_repack, perform a test restore, which you should be doing anyways, using a logical dump just to make certain things are good.
1 Like