Any idea, why I am getting the below error after restarting my replica PostgresDB server?
2022-11-29 08:24:37.129 UTC  LOG: starting PostgreSQL 14.5 (Ubuntu 14.5-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit
2022-11-29 08:24:37.129 UTC  LOG: listening on IPv4 address “0.0.0.0”, port 5432
2022-11-29 08:24:37.129 UTC  LOG: listening on IPv6 address “::”, port 5432
2022-11-29 08:24:37.130 UTC  LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
2022-11-29 08:24:37.143 UTC  LOG: database system was shut down in recovery at 2022-11-29 08:24:34 UTC
2022-11-29 08:24:37.143 UTC  LOG: entering standby mode
2022-11-29 08:24:37.152 UTC  LOG: redo starts at 40/460000A0
2022-11-29 08:24:37.194 UTC  LOG: consistent recovery state reached at 40/469AEA28
2022-11-29 08:24:37.194 UTC  LOG: invalid record length at 40/469AEA28: wanted 24, got 0
2022-11-29 08:24:37.195 UTC  LOG: database system is ready to accept read-only connections
2022-11-29 08:24:37.222 UTC  LOG: started streaming WAL from primary at 40/46000000 on timeline 1
2022-11-29 08:28:22.427 UTC  LOG: received fast shutdown request
2022-11-29 08:28:22.581 UTC  LOG: aborting any active transactions
2022-11-29 08:28:22.581 UTC  FATAL: terminating walreceiver process due to administrator command
2022-11-29 08:28:22.585 UTC  LOG: shutting down
2022-11-29 08:28:23.503 UTC  LOG: database system is shut down
Can some one help me on this issue?
I suspect your REPLICA is out of sync because it read a WAL segment that was on a different timeline then your PRIMARY.
There’s a couple of ways of checking:
For both PRIMARY and REPLICA, under directory pg_wal look for the most recent WAL segment and examine the timelines.If it is out of sync the time line on the PRIMARY will be different than the REPLICA wal_segment.
This is old school: Execute the following command on both hosts and compare the outputs. If the REPLICA is out of sync it shows up in the various pieces of information that is output.
EXAMPLE: use pg_controldata
/usr/lib/postgresql/14/bin/pg_controldata -D /opt/pg14/data | less
REMEDIATION (assuming it’s out of sync): Either perform a new basebackup … OR … use pg_rewind to resync REPLICA to PRIMARY
Refer to the postgres documentation for more information describing how to use pg_rewind.
There’s also other possible issues including:
- The REPLICA is misconfigured and is reading the wrong timeline
- You may be missing WAL segments.
In any case the REPLICA is now out-of-sync. But how it got that way is not understood due to insufficient information.
Thanks for the quick response.
But the replica is running fine. This issue is occurring only after restarting the replica DB server.
After restarting, the replica server is running fine and replication also fine from Master DB.
Good to know. So long as the REPLICA is picking up from the PRIMARY then you shouldn’t need to worry.
So, we can ignore the above error if slave is running fine without any replication break/issue from the Master.
Apart from suggesting a cluster-wide vacuum, or repack using pg_repack, perform a test restore, which you should be doing anyways, using a logical dump just to make certain things are good.