Hi,
can someone be so kind and explain to me why we have a new timeline each time we do a switchover / failover?
we are using postgres 11 with percona patroni setup - we got 3 nodes, one of them is a sync primary - so i would expect it to switchover without using a .partial / wal file or anything so i can see no need for increasing the timeline version number
it’s a fresh setup because we are evaluating it’s use - so no data has been imported yet and no applications are connected.
example log from the sync standby that got promoted
2023-03-07 17:17:14.668 CET [2329480] LOG: incomplete startup packet
2023-03-07 17:17:14.668 CET [2329481] LOG: incomplete startup packet
2023-03-07 17:17:14.669 CET [2329482] LOG: incomplete startup packet
2023-03-07 17:17:23.227 CET [2329517] LOG: incomplete startup packet
2023-03-07 17:17:23.227 CET [2329518] LOG: incomplete startup packet
2023-03-07 17:17:23.227 CET [2329519] LOG: incomplete startup packet
2023-03-07 17:17:24.333 CET [436384] LOG: replication terminated by primary server
2023-03-07 17:17:24.333 CET [436384] DETAIL: End of WAL reached on timeline 4 at 0/4000F08.
2023-03-07 17:17:24.333 CET [436384] FATAL: could not send end-of-streaming message to primary: no COPY in progress
2023-03-07 17:17:24.333 CET [436331] LOG: invalid record length at 0/4000F08: wanted 24, got 0
2023-03-07 17:17:24.342 CET [2329526] FATAL: could not connect to the primary server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2023-03-07 17:17:25.529 CET [436327] LOG: received SIGHUP, reloading configuration files
2023-03-07 17:17:25.536 CET [2329537] FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "10.224.22.75" and accepting
TCP/IP connections on port 5432?
2023-03-07 17:17:25.583 CET [436331] LOG: received promote request
2023-03-07 17:17:25.583 CET [436331] LOG: redo done at 0/4000E98
2023-03-07 17:17:25.618 CET [436331] LOG: selected new timeline ID: 5
2023-03-07 17:17:25.768 CET [436331] LOG: archive recovery complete
2023-03-07 17:17:25.798 CET [436327] LOG: database system is ready to accept connections
2023-03-07 17:17:26.027 CET [2329547] ERROR: replication slot "psql03" does not exist
2023-03-07 17:17:26.027 CET [2329547] STATEMENT: START_REPLICATION SLOT "psql03" 0/4000000 TIMELINE 4
2023-03-07 17:17:26.049 CET [2329548] ERROR: replication slot "psql03" does not exist
2023-03-07 17:17:26.049 CET [2329548] STATEMENT: START_REPLICATION SLOT "psql03" 0/4000000 TIMELINE 5
2023-03-07 17:17:26.071 CET [2329549] ERROR: replication slot "psql03" does not exist
2023-03-07 17:17:26.071 CET [2329549] STATEMENT: START_REPLICATION SLOT "psql03" 0/4000000 TIMELINE 5
2023-03-07 17:17:27.803 CET [2329560] LOG: standby "psql01" is now a synchronous standby with priority 1
2023-03-07 17:17:27.803 CET [2329560] STATEMENT: START_REPLICATION SLOT "psql01" 0/4000000 TIMELINE 4
2023-03-07 17:17:29.668 CET [2329568] LOG: incomplete startup packet
2023-03-07 17:17:29.668 CET [2329569] LOG: incomplete startup packet
2023-03-07 17:17:29.668 CET [2329570] LOG: incomplete startup packet
2023-03-07 17:17:33.227 CET [2329587] LOG: incomplete startup packet
2023-03-07 17:17:33.228 CET [2329588] LOG: incomplete startup packet
2023-03-07 17:17:33.228 CET [2329589] LOG: incomplete startup packet
2023-03-07 17:17:36.688 CET [436327] LOG: received SIGHUP, reloading configuration files
2023-03-07 17:17:36.689 CET [436327] LOG: parameter "synchronous_standby_names" changed to ""psql01""
2023-03-07 17:17:43.228 CET [2329651] LOG: incomplete startup packet
2023-03-07 17:17:43.228 CET [2329652] LOG: incomplete startup packet
2023-03-07 17:17:43.229 CET [2329650] LOG: incomplete startup packet
2023-03-07 17:17:44.668 CET [2329657] LOG: incomplete startup packet
2023-03-07 17:17:44.668 CET [2329658] LOG: incomplete startup packet
example from the previous master on switchover
2023-03-07 17:17:23.807 CET [3000727] LOG: incomplete startup packet
2023-03-07 17:17:23.808 CET [3000728] LOG: incomplete startup packet
2023-03-07 17:17:23.809 CET [3000729] LOG: incomplete startup packet
2023-03-07 17:17:24.273 CET [2887] LOG: received fast shutdown request
2023-03-07 17:17:24.278 CET [2887] LOG: aborting any active transactions
2023-03-07 17:17:24.278 CET [2897] FATAL: terminating connection due to administrator command
2023-03-07 17:17:24.281 CET [2887] LOG: background worker "logical replication launcher" (PID 549571) exited with exit code 1
2023-03-07 17:17:24.282 CET [2892] LOG: shutting down
2023-03-07 17:17:24.342 CET [2887] LOG: database system is shut down
and the replica (not sync)
2023-03-07 17:17:25.989 CET [1894523] LOG: database system was shut down in recovery at 2023-03-07 17:17:25 CET
2023-03-07 17:17:25.990 CET [1894523] LOG: entering standby mode
2023-03-07 17:17:25.997 CET [1894523] LOG: consistent recovery state reached at 0/4000F08
2023-03-07 17:17:25.997 CET [1894523] LOG: invalid record length at 0/4000F08: wanted 24, got 0
2023-03-07 17:17:25.998 CET [1894520] LOG: database system is ready to accept read only connections
2023-03-07 17:17:26.007 CET [1894524] FATAL: the database system is starting up
2023-03-07 17:17:26.021 CET [1894528] LOG: fetching timeline history file for timeline 5 from primary server
2023-03-07 17:17:26.027 CET [1894528] FATAL: could not start WAL streaming: ERROR: replication slot "psql03" does not exist
2023-03-07 17:17:26.028 CET [1894523] LOG: new target timeline is 5
2023-03-07 17:17:26.049 CET [1894532] FATAL: could not start WAL streaming: ERROR: replication slot "psql03" does not exist
2023-03-07 17:17:26.071 CET [1894534] FATAL: could not start WAL streaming: ERROR: replication slot "psql03" does not exist
2023-03-07 17:17:30.454 CET [1894556] LOG: incomplete startup packet
2023-03-07 17:17:30.454 CET [1894557] LOG: incomplete startup packet
2023-03-07 17:17:30.454 CET [1894558] LOG: incomplete startup packet
2023-03-07 17:17:31.080 CET [1894560] LOG: started streaming WAL from primary at 0/4000000 on timeline 5
2023-03-07 17:17:31.151 CET [1894523] LOG: redo starts at 0/4000F08
2023-03-07 17:17:31.760 CET [1894565] LOG: incomplete startup packet
2023-03-07 17:17:31.760 CET [1894566] LOG: incomplete startup packet
2023-03-07 17:17:31.761 CET [1894567] LOG: incomplete startup packet
i also reviewed some online blogs about switchover and couldnt find the above errors in the examples shown there
thank you very much and
kind regards
Marcel