Bug in pg_stat_monitor version 2.1 . Database Hungs

harish.jaipu · February 25, 2025, 3:50am

Version psql (PostgreSQL) 16.6 - Percona Distribution.

Database goes hung when the pg_stat_monitor query runs to fetch metrics. During this time no logs were gernerated in the error log and database was unresponsive and does not login.
Killing this pid from ps -ef brought back the DB to normal state.
This has happened three days in a row and we found out the issue now .

Any one else faced the same issue ? Is this a bug in 2.1 version or is it in Percona ?

Below is the query that caused the problem

`SELECT /* agent='pgstatmonitor' */ ""pg_stat_monitor"".""bucket"", ""pg_stat_monitor"".""client_ip"", ""pg_stat_monitor"".""query"", ""pg_stat_monitor"".""calls"", ""pg_stat_monitor"".""shared_blks_hit"", ""pg_stat_monitor"".""shared_blks_read"", ""pg_stat_monitor"".""shared_blks_dirtied"", ""pg_stat_monitor"".""shared_blks_written"", ""pg_stat_monitor"".""local_blks_hit"", ""pg_stat_monitor"".""local_blks_read"", ""pg_stat_monitor"".""local_blks_dirtied"", ""pg_stat_monitor"".""local_blks_written"", ""pg_stat_monitor"".""temp_blks_read"", ""pg_stat_monitor"".""temp_blks_written"", ""pg_stat_monitor"".""blk_read_time"", ""pg_stat_monitor"".""blk_write_time"", ""pg_stat_monitor"".""resp_calls"", ""pg_stat_monitor"".""cpu_user_time"", ""pg_stat_monitor"".""cpu_sys_time"", ""pg_stat_monitor"".""rows"", ""pg_stat_monitor"".""relations"", ""pg_stat_monitor"".""datname"", ""pg_stat_monitor"".""userid"", ""pg_stat_monitor"".""top_queryid"", ""pg_stat_monitor"".""planid"", ""pg_stat_monitor"".""query_plan"", ""pg_stat_monitor"".""top_query"", ""pg_stat_monitor"".""application_name"", ""pg_stat_monitor"".""cmd_type"", ""pg_stat_monitor"".""cmd_type_text"", ""pg_stat_monitor"".""elevel"", ""pg_stat_monitor"".""sqlcode"", ""pg_stat_monitor"".""message"", ""pg_stat_monitor"".""pgsm_query_id"", ""pg_stat_monitor"".""dbid"", ""pg_stat_monitor"".""total_exec_time"", ""pg_stat_monitor"".""min_exec_time"", ""pg_stat_monitor"".""max_exec_time"", ""pg_stat_monitor"".""mean_exec_time"", ""pg_stat_monitor"".""stddev_exec_time"", ""pg_stat_monitor"".""total_plan_time"", ""pg_stat_monitor"".""min_plan_time"", ""pg_stat_monitor"".""max_plan_time"", ""pg_stat_monitor"".""mean_plan_time"", ""pg_stat_monitor"".""wal_records"", ""pg_stat_monitor"".""wal_fpi"", ""pg_stat_monitor"".""wal_bytes"", ""pg_stat_monitor"".""plans"", ""pg_stat_monitor"".""comments"", ""pg_stat_monitor"".""bucket_start_time"", ""pg_stat_monitor"".""username"" FROM ""pg_stat_monitor"" WHERE queryid IS NOT NULL AND query IS NOT NULL AND bucket_done AND pgsm_query_id IS NOT NULL"

`

Logs below which stopped writing after 01:11 am and went hung and continued writing after killing it

pg_stat_user_tables
2025-02-25 01:03:41 IST [1021497]: [1-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15LOG: process 1021497 still waiting for ShareLock on transaction 216925398 after 1000.044 ms
2025-02-25 01:03:41 IST [1021497]: [2-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15DETAIL: Process holding the lock: 655052. Wait queue: 1021497.
2025-02-25 01:03:41 IST [1021497]: [3-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15CONTEXT: while locking tuple (0,2) in relation “qrtz_locks”
2025-02-25 01:03:41 IST [1021497]: [4-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15STATEMENT: SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = ‘MetabaseScheduler’ AND LOCK_NAME = $1 FOR UPDATE
2025-02-25 01:04:01 IST [71003]: [1-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15LOG: process 71003 still waiting for ShareLock on transaction 216925398 after 1000.044 ms
2025-02-25 01:04:01 IST [71003]: [2-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15DETAIL: Process holding the lock: 655052. Wait queue: 1021497, 71003.
2025-02-25 01:04:01 IST [71003]: [3-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15CONTEXT: while locking tuple (0,1) in relation “qrtz_locks”
2025-02-25 01:04:01 IST [71003]: [4-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15STATEMENT: SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = ‘MetabaseScheduler’ AND LOCK_NAME = $1 FOR UPDATE
2025-02-25 01:11:17 IST [68307]: [43-1] user=,db=,host=LOG: checkpoint starting: time

2025-02-25 01:32:09 IST [68293]: [14-1] user=,db=,host=LOG: server process (PID 69787) was terminated by signal 9: Killed
2025-02-25 01:32:09 IST [68293]: [15-1] user=,db=,host=DETAIL: Failed process was running: SELECT / agent=‘pgstatmonitor’ / “pg_stat_monitor”.“bucket”, “pg_stat_monitor”.“client_ip”, “pg_stat_monitor”.“query”, “pg_stat_monitor”.“calls”, “pg_stat_monitor”.“shared_blks_hit”, “pg_stat_monitor”.“shared_blks_read”, “pg_stat_monitor”.“shared_blks_dirtied”, “pg_stat_monitor”.“shared_blks_written”, “pg_stat_monitor”.“local_blks_hit”, “pg_stat_monitor”.“local_blks_read”, “pg_stat_monitor”.“local_blks_dirtied”, “pg_stat_monitor”.“local_blks_written”, “pg_stat_monitor”.“temp_blks_read”, “pg_stat_monitor”.“temp_blks_written”, “pg_stat_monitor”.“blk_read_time”, “pg_stat_monitor”.“blk_write_time”, “pg_stat_monitor”.“resp_calls”, “pg_stat_monitor”.“cpu_user_time”, “pg_stat_monitor”.“cpu_sys_time”, “pg_stat_monitor”.“rows”, “pg_stat_monitor”.“relations”, “pg_stat_monitor”.“datname”, “pg_stat_monitor”.“userid”, “pg_stat_monitor”.“top_queryid”, “pg_stat_monitor”.“planid”, “pg_stat_monitor”.“query_plan”, “pg_stat_monitor”.“top_query”, “pg_stat_monitor”.“application_name”, “pg_stat_monitor”.“cmd_type”, "pg_stat_mon
2025-02-25 01:32:09 IST [68293]: [16-1] user=,db=,host=LOG: terminating any other active server processes
2025-02-25 01:32:10 IST [1084388]: [1-1] user=postgres,db=postgres,host=[local]FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084387]: [1-1] user=applicationid,db=reporting_master,host=100.0.8.106FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084389]: [1-1] user=applicationid,db=reporting_master,host=100.0.10.139FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084391]: [1-1] user=pmmusr,db=postgres,host=172.150.0.56FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084392]: [1-1] user=applicationid,db=keyman ,host=100.0.7.172FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084395]: [1-1] user=applicationid,db=cim,host=100.0.9.30FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084396]: [1-1] user=applicationid,db=cim,host=100.0.9.178FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084397]: [1-1] user=applicationid,db=keyman ,host=100.0.9.178FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084398]: [1-1] user=applicationid,db=keyman ,host=100.0.9.152FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084399]: [1-1] user=applicationid,db=keyman ,host=100.0.8.106FATAL: the database system is in recovery mode

harish.jaipu · March 3, 2025, 9:29am

Dear All,
Can any one clarify if you have faced the same kind of issue ?

Srikanth_Lolugu · July 15, 2025, 8:03pm

Hi Harish,

We’re facing the same issue database hungs on pg_stat_monitor wait type, killing some random session from OS brings the DB back online. How are you sure that specific query only causing the DB to hung.

harish.jaipu · July 16, 2025, 3:08am

Hi Srikanth,
The Query i have shared is the problem. It’s from the pgstatmonitor itself. When it tries to get metadata from the DB it gets hung.
Better to replace the extension with pgstatstatements.

Srikanth_Lolugu · July 16, 2025, 1:27pm

Hi Harish,

Thank you for sharing the query info!! I’m able to reproduce the issue partially. I’ve run above pmm SELECT manually, monitored for active sessions wait_event. Observed that active sessions waiting on pg_stat_monitor wait_event.

harish.jaipu · July 16, 2025, 3:28pm

Cool!! Let me know if you find any resolution.

Topic		Replies	Views
Pg_stat_monitor (2.1.0) having issues in Percona Postgres server 16.8 PostgreSQL	1	51	July 13, 2025
Pg_stat_monitor 2.1.0 - Segmentation fault pg_stat_monitor percona , postgres	11	256	April 17, 2025
Segment fault causing database to restart every minuet Percona Operator for PostgreSQL pmm	5	145	December 18, 2024
QAN PostgresSQL pg_stat_monitor not working pg_stat_monitor	2	2142	January 27, 2024
New blog post: Understand Your PostgreSQL Workloads Better with pg_stat_monitor pg_stat_monitor closed-no-reply	0	933	May 9, 2022

Bug in pg_stat_monitor version 2.1 . Database Hungs

Related topics