Segment fault causing database to restart every minuet

Lobo_Lobo20919 · December 17, 2024, 4:05pm

We have two clusters in the same K8 both started to fail and re-start every minuet due to a segement fault in pg_stat_monitor. I have not been able to find the cause and looking for some assistance. I have disabled the pmm client for both clusters which resolves the issue. I have also deleted and re-installed pmm helm chart then added back the pmm-client to each cluster and the error returns. See the log below for the error.

2024-12-17 15:17:57.092 UTC [174] LOG: database system is ready to accept connections
2024-12-17 15:18:25.998 UTC [3041] LOG: could not receive data from client: Connection reset by peer
2024-12-17 15:18:25.998 UTC [3041] STATEMENT: START_REPLICATION 1AA/DD000000 TIMELINE 34
2024-12-17 15:18:25.998 UTC [3041] LOG: unexpected EOF on standby connection
2024-12-17 15:18:25.998 UTC [3041] STATEMENT: START_REPLICATION 1AA/DD000000 TIMELINE 34
2024-12-17 15:18:58.601 UTC [174] LOG: server process (PID 3257) was terminated by signal 11: Segmentation fault
2024-12-17 15:18:58.601 UTC [174] DETAIL: Failed process was running: SELECT /* agent=‘pgstatmonitor’ */ “pg_stat_monitor”.“bucket”, “pg_stat_monitor”.“client_ip”, “pg_stat_monitor”.“query”, “pg_stat_monitor”.“calls”, “pg_stat_monitor”.“shared_blks_hit”, “pg_stat_monitor”.“shared_blks_read”, “pg_stat_monitor”.“shared_blks_dirtied”, “pg_stat_monitor”.“shared_blks_written”, “pg_stat_monitor”.“local_blks_hit”, “pg_stat_monitor”.“local_blks_read”, “pg_stat_monitor”.“local_blks_dirtied”, “pg_stat_monitor”.“local_blks_written”, “pg_stat_monitor”.“temp_blks_read”, “pg_stat_monitor”.“temp_blks_written”, “pg_stat_monitor”.“blk_read_time”, “pg_stat_monitor”.“blk_write_time”, “pg_stat_monitor”.“resp_calls”, “pg_stat_monitor”.“cpu_user_time”, “pg_stat_monitor”.“cpu_sys_time”, “pg_stat_monitor”.“rows”, “pg_stat_monitor”.“relations”, “pg_stat_monitor”.“datname”, “pg_stat_monitor”.“userid”, “pg_stat_monitor”.“top_queryid”, “pg_stat_monitor”.“planid”, “pg_stat_monitor”.“query_plan”, “pg_stat_monitor”.“top_query”, “pg_stat_monitor”.“application_name”, “pg_stat_monitor”.“cmd_type”, "pg_stat_mon
2024-12-17 15:18:58.601 UTC [174] LOG: terminating any other active server processes
2024-12-17 15:18:58.603 UTC [3266] FATAL: the database system is in recovery mode
2024-12-17 15:18:58.604 UTC [174] LOG: all server processes terminated; reinitializing
2024-12-17 15:18:58.605 UTC [174] LOG: [pg_stat_monitor] pgsm_shmem_shutdown: Shutdown initiated.
2024-12-17 15:18:58.859 UTC [3267] LOG: database system was interrupted; last known up at 2024-12-17 15:17:57 UTC
2024-12-17 15:18:58.864 UTC [3270] FATAL: the database system is in recovery mode
2024-12-17 15:18:58.871 UTC [3267] LOG: database system was not properly shut down; automatic recovery in progress
2024-12-17 15:18:58.874 UTC [3267] LOG: redo starts at 1AA/DD013648
2024-12-17 15:18:58.874 UTC [3267] LOG: invalid record length at 1AA/DD013BA0: wanted 24, got 0
2024-12-17 15:18:58.874 UTC [3267] LOG: redo done at 1AA/DD013B68 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-12-17 15:18:58.878 UTC [3268] LOG: checkpoint starting: end-of-recovery immediate wait
2024-12-17 15:18:58.889 UTC [3268] LOG: checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.003 s, total=0.013 s; sync files=4, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB
2024-12-17 15:18:58.893 UTC [174] LOG: database system is ready to accept connections

Slava_Sarzhan · December 17, 2024, 7:30pm

Hi @Lobo_Lobo20919, Starting from PGO 2.5.0 you can choose querySource via CR. Could you please try to test pgstatstatements? And we need to know your PGO and PG version. Thank you.

Lobo_Lobo20919 · December 17, 2024, 9:37pm

I’m using PGO 2.5.0 and postgres 15. I’m confused by the link “querySource” as it points to a problem with a PGO restore issue I opened a while back that seems un-related to this issue with pgstatstatements.

My only guess it perhaps the 2.5.0 upgrade that was done a month ago caused it at some point after the upgrade. However I have 3 other K8’s with the same PGO version and postgres 15 that were also upgraded this year that do not have any issues.

Thanks

Lobo_Lobo20919 · December 17, 2024, 9:46pm

and Yes I’ll try a switch to pgstatstatements and post the results.

Lobo_Lobo20919 · December 18, 2024, 2:19pm

yes switching to pgstatstatements worked. Thanks for the suggestion.

Slava_Sarzhan · December 18, 2024, 2:29pm

Sorry, I have added the wrong link. I have corrected it.

Topic		Replies	Views
Pmm-client making too many connections to the postgres database PMM 2.x pmm , postgres	7	2232	January 24, 2023
Pmm-client is not starting up Percona Operator for PostgreSQL pmm	4	107	March 19, 2025
Pg_stat_monitor 2.1.0 - Segmentation fault pg_stat_monitor percona , postgres	11	203	April 17, 2025
[postgres] [QAN] QAN examples not found PMM 2.x pmm , postgres	3	935	September 29, 2023
Bug in pg_stat_monitor version 2.1 . Database Hungs Percona Distribution for PostgreSQL percona	1	74	March 3, 2025

Segment fault causing database to restart every minuet

Related topics