Bug in pg_stat_monitor version 2.1 . Database Hungs

Version psql (PostgreSQL) 16.6 - Percona Distribution.

Database goes hung when the pg_stat_monitor query runs to fetch metrics. During this time no logs were gernerated in the error log and database was unresponsive and does not login.
Killing this pid from ps -ef brought back the DB to normal state.
This has happened three days in a row and we found out the issue now .

Any one else faced the same issue ? Is this a bug in 2.1 version or is it in Percona ?

Below is the query that caused the problem

`SELECT /* agent='pgstatmonitor' */ ""pg_stat_monitor"".""bucket"", ""pg_stat_monitor"".""client_ip"", ""pg_stat_monitor"".""query"", ""pg_stat_monitor"".""calls"", ""pg_stat_monitor"".""shared_blks_hit"", ""pg_stat_monitor"".""shared_blks_read"", ""pg_stat_monitor"".""shared_blks_dirtied"", ""pg_stat_monitor"".""shared_blks_written"", ""pg_stat_monitor"".""local_blks_hit"", ""pg_stat_monitor"".""local_blks_read"", ""pg_stat_monitor"".""local_blks_dirtied"", ""pg_stat_monitor"".""local_blks_written"", ""pg_stat_monitor"".""temp_blks_read"", ""pg_stat_monitor"".""temp_blks_written"", ""pg_stat_monitor"".""blk_read_time"", ""pg_stat_monitor"".""blk_write_time"", ""pg_stat_monitor"".""resp_calls"", ""pg_stat_monitor"".""cpu_user_time"", ""pg_stat_monitor"".""cpu_sys_time"", ""pg_stat_monitor"".""rows"", ""pg_stat_monitor"".""relations"", ""pg_stat_monitor"".""datname"", ""pg_stat_monitor"".""userid"", ""pg_stat_monitor"".""top_queryid"", ""pg_stat_monitor"".""planid"", ""pg_stat_monitor"".""query_plan"", ""pg_stat_monitor"".""top_query"", ""pg_stat_monitor"".""application_name"", ""pg_stat_monitor"".""cmd_type"", ""pg_stat_monitor"".""cmd_type_text"", ""pg_stat_monitor"".""elevel"", ""pg_stat_monitor"".""sqlcode"", ""pg_stat_monitor"".""message"", ""pg_stat_monitor"".""pgsm_query_id"", ""pg_stat_monitor"".""dbid"", ""pg_stat_monitor"".""total_exec_time"", ""pg_stat_monitor"".""min_exec_time"", ""pg_stat_monitor"".""max_exec_time"", ""pg_stat_monitor"".""mean_exec_time"", ""pg_stat_monitor"".""stddev_exec_time"", ""pg_stat_monitor"".""total_plan_time"", ""pg_stat_monitor"".""min_plan_time"", ""pg_stat_monitor"".""max_plan_time"", ""pg_stat_monitor"".""mean_plan_time"", ""pg_stat_monitor"".""wal_records"", ""pg_stat_monitor"".""wal_fpi"", ""pg_stat_monitor"".""wal_bytes"", ""pg_stat_monitor"".""plans"", ""pg_stat_monitor"".""comments"", ""pg_stat_monitor"".""bucket_start_time"", ""pg_stat_monitor"".""username"" FROM ""pg_stat_monitor"" WHERE queryid IS NOT NULL AND query IS NOT NULL AND bucket_done AND pgsm_query_id IS NOT NULL"

`

Logs below which stopped writing after 01:11 am and went hung and continued writing after killing it

pg_stat_user_tables
2025-02-25 01:03:41 IST [1021497]: [1-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15LOG: process 1021497 still waiting for ShareLock on transaction 216925398 after 1000.044 ms
2025-02-25 01:03:41 IST [1021497]: [2-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15DETAIL: Process holding the lock: 655052. Wait queue: 1021497.
2025-02-25 01:03:41 IST [1021497]: [3-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15CONTEXT: while locking tuple (0,2) in relation ā€œqrtz_locksā€
2025-02-25 01:03:41 IST [1021497]: [4-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15STATEMENT: SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = ā€˜MetabaseScheduler’ AND LOCK_NAME = $1 FOR UPDATE
2025-02-25 01:04:01 IST [71003]: [1-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15LOG: process 71003 still waiting for ShareLock on transaction 216925398 after 1000.044 ms
2025-02-25 01:04:01 IST [71003]: [2-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15DETAIL: Process holding the lock: 655052. Wait queue: 1021497, 71003.
2025-02-25 01:04:01 IST [71003]: [3-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15CONTEXT: while locking tuple (0,1) in relation ā€œqrtz_locksā€
2025-02-25 01:04:01 IST [71003]: [4-1] user=applicationid,db=techsupport_analytics_reports,host=100.0.2.15STATEMENT: SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = ā€˜MetabaseScheduler’ AND LOCK_NAME = $1 FOR UPDATE
2025-02-25 01:11:17 IST [68307]: [43-1] user=,db=,host=LOG: checkpoint starting: time

2025-02-25 01:32:09 IST [68293]: [14-1] user=,db=,host=LOG: server process (PID 69787) was terminated by signal 9: Killed
2025-02-25 01:32:09 IST [68293]: [15-1] user=,db=,host=DETAIL: Failed process was running: SELECT / agent=ā€˜pgstatmonitor’ / ā€œpg_stat_monitorā€.ā€œbucketā€, ā€œpg_stat_monitorā€.ā€œclient_ipā€, ā€œpg_stat_monitorā€.ā€œqueryā€, ā€œpg_stat_monitorā€.ā€œcallsā€, ā€œpg_stat_monitorā€.ā€œshared_blks_hitā€, ā€œpg_stat_monitorā€.ā€œshared_blks_readā€, ā€œpg_stat_monitorā€.ā€œshared_blks_dirtiedā€, ā€œpg_stat_monitorā€.ā€œshared_blks_writtenā€, ā€œpg_stat_monitorā€.ā€œlocal_blks_hitā€, ā€œpg_stat_monitorā€.ā€œlocal_blks_readā€, ā€œpg_stat_monitorā€.ā€œlocal_blks_dirtiedā€, ā€œpg_stat_monitorā€.ā€œlocal_blks_writtenā€, ā€œpg_stat_monitorā€.ā€œtemp_blks_readā€, ā€œpg_stat_monitorā€.ā€œtemp_blks_writtenā€, ā€œpg_stat_monitorā€.ā€œblk_read_timeā€, ā€œpg_stat_monitorā€.ā€œblk_write_timeā€, ā€œpg_stat_monitorā€.ā€œresp_callsā€, ā€œpg_stat_monitorā€.ā€œcpu_user_timeā€, ā€œpg_stat_monitorā€.ā€œcpu_sys_timeā€, ā€œpg_stat_monitorā€.ā€œrowsā€, ā€œpg_stat_monitorā€.ā€œrelationsā€, ā€œpg_stat_monitorā€.ā€œdatnameā€, ā€œpg_stat_monitorā€.ā€œuseridā€, ā€œpg_stat_monitorā€.ā€œtop_queryidā€, ā€œpg_stat_monitorā€.ā€œplanidā€, ā€œpg_stat_monitorā€.ā€œquery_planā€, ā€œpg_stat_monitorā€.ā€œtop_queryā€, ā€œpg_stat_monitorā€.ā€œapplication_nameā€, ā€œpg_stat_monitorā€.ā€œcmd_typeā€, "pg_stat_mon
2025-02-25 01:32:09 IST [68293]: [16-1] user=,db=,host=LOG: terminating any other active server processes
2025-02-25 01:32:10 IST [1084388]: [1-1] user=postgres,db=postgres,host=[local]FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084387]: [1-1] user=applicationid,db=reporting_master,host=100.0.8.106FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084389]: [1-1] user=applicationid,db=reporting_master,host=100.0.10.139FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084391]: [1-1] user=pmmusr,db=postgres,host=172.150.0.56FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084392]: [1-1] user=applicationid,db=keyman ,host=100.0.7.172FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084395]: [1-1] user=applicationid,db=cim,host=100.0.9.30FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084396]: [1-1] user=applicationid,db=cim,host=100.0.9.178FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084397]: [1-1] user=applicationid,db=keyman ,host=100.0.9.178FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084398]: [1-1] user=applicationid,db=keyman ,host=100.0.9.152FATAL: the database system is in recovery mode
2025-02-25 01:32:10 IST [1084399]: [1-1] user=applicationid,db=keyman ,host=100.0.8.106FATAL: the database system is in recovery mode

1 Like

Dear All,
Can any one clarify if you have faced the same kind of issue ?

Hi Harish,

We’re facing the same issue database hungs on pg_stat_monitor wait type, killing some random session from OS brings the DB back online. How are you sure that specific query only causing the DB to hung.

Hi Srikanth,
The Query i have shared is the problem. It’s from the pgstatmonitor itself. When it tries to get metadata from the DB it gets hung.
Better to replace the extension with pgstatstatements.

Hi Harish,

Thank you for sharing the query info!! I’m able to reproduce the issue partially. I’ve run above pmm SELECT manually, monitored for active sessions wait_event. Observed that active sessions waiting on pg_stat_monitor wait_event.

Cool!! Let me know if you find any resolution.