Segment fault causing database to restart every minuet

We have two clusters in the same K8 both started to fail and re-start every minuet due to a segement fault in pg_stat_monitor. I have not been able to find the cause and looking for some assistance. I have disabled the pmm client for both clusters which resolves the issue. I have also deleted and re-installed pmm helm chart then added back the pmm-client to each cluster and the error returns. See the log below for the error.

2024-12-17 15:17:57.092 UTC [174] LOG: database system is ready to accept connections
2024-12-17 15:18:25.998 UTC [3041] LOG: could not receive data from client: Connection reset by peer
2024-12-17 15:18:25.998 UTC [3041] STATEMENT: START_REPLICATION 1AA/DD000000 TIMELINE 34
2024-12-17 15:18:25.998 UTC [3041] LOG: unexpected EOF on standby connection
2024-12-17 15:18:25.998 UTC [3041] STATEMENT: START_REPLICATION 1AA/DD000000 TIMELINE 34
2024-12-17 15:18:58.601 UTC [174] LOG: server process (PID 3257) was terminated by signal 11: Segmentation fault
2024-12-17 15:18:58.601 UTC [174] DETAIL: Failed process was running: SELECT /* agent=‘pgstatmonitor’ */ “pg_stat_monitor”.“bucket”, “pg_stat_monitor”.“client_ip”, “pg_stat_monitor”.“query”, “pg_stat_monitor”.“calls”, “pg_stat_monitor”.“shared_blks_hit”, “pg_stat_monitor”.“shared_blks_read”, “pg_stat_monitor”.“shared_blks_dirtied”, “pg_stat_monitor”.“shared_blks_written”, “pg_stat_monitor”.“local_blks_hit”, “pg_stat_monitor”.“local_blks_read”, “pg_stat_monitor”.“local_blks_dirtied”, “pg_stat_monitor”.“local_blks_written”, “pg_stat_monitor”.“temp_blks_read”, “pg_stat_monitor”.“temp_blks_written”, “pg_stat_monitor”.“blk_read_time”, “pg_stat_monitor”.“blk_write_time”, “pg_stat_monitor”.“resp_calls”, “pg_stat_monitor”.“cpu_user_time”, “pg_stat_monitor”.“cpu_sys_time”, “pg_stat_monitor”.“rows”, “pg_stat_monitor”.“relations”, “pg_stat_monitor”.“datname”, “pg_stat_monitor”.“userid”, “pg_stat_monitor”.“top_queryid”, “pg_stat_monitor”.“planid”, “pg_stat_monitor”.“query_plan”, “pg_stat_monitor”.“top_query”, “pg_stat_monitor”.“application_name”, “pg_stat_monitor”.“cmd_type”, "pg_stat_mon
2024-12-17 15:18:58.601 UTC [174] LOG: terminating any other active server processes
2024-12-17 15:18:58.603 UTC [3266] FATAL: the database system is in recovery mode
2024-12-17 15:18:58.604 UTC [174] LOG: all server processes terminated; reinitializing
2024-12-17 15:18:58.605 UTC [174] LOG: [pg_stat_monitor] pgsm_shmem_shutdown: Shutdown initiated.
2024-12-17 15:18:58.859 UTC [3267] LOG: database system was interrupted; last known up at 2024-12-17 15:17:57 UTC
2024-12-17 15:18:58.864 UTC [3270] FATAL: the database system is in recovery mode
2024-12-17 15:18:58.871 UTC [3267] LOG: database system was not properly shut down; automatic recovery in progress
2024-12-17 15:18:58.874 UTC [3267] LOG: redo starts at 1AA/DD013648
2024-12-17 15:18:58.874 UTC [3267] LOG: invalid record length at 1AA/DD013BA0: wanted 24, got 0
2024-12-17 15:18:58.874 UTC [3267] LOG: redo done at 1AA/DD013B68 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-12-17 15:18:58.878 UTC [3268] LOG: checkpoint starting: end-of-recovery immediate wait
2024-12-17 15:18:58.889 UTC [3268] LOG: checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.003 s, total=0.013 s; sync files=4, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB
2024-12-17 15:18:58.893 UTC [174] LOG: database system is ready to accept connections

Hi @Lobo_Lobo20919, Starting from PGO 2.5.0 you can choose querySource via CR. Could you please try to test pgstatstatements? And we need to know your PGO and PG version. Thank you.

I’m using PGO 2.5.0 and postgres 15. I’m confused by the link “querySource” as it points to a problem with a PGO restore issue I opened a while back that seems un-related to this issue with pgstatstatements.

My only guess it perhaps the 2.5.0 upgrade that was done a month ago caused it at some point after the upgrade. However I have 3 other K8’s with the same PGO version and postgres 15 that were also upgraded this year that do not have any issues.

Thanks

and Yes I’ll try a switch to pgstatstatements and post the results.

yes switching to pgstatstatements worked. Thanks for the suggestion.

Sorry, I have added the wrong link. I have corrected it.