Errors in collecting metrics under high database load

ANDiv1976 · November 5, 2025, 12:43pm

Hello!
I installed PMM server version 3. On a highly loaded test server running PostgreSQL version 14, I installed the PMM agent version 3 and pg_stat_monitor 2.2. Metrics are being received.

Agent settings:
pg_stat_monitor.pgsm_bucket_time: 60
pg_stat_monitor.pgsm_enable_overflow: true
pg_stat_monitor.pgsm_enable_pgsm_query_id: false
pg_stat_monitor.pgsm_enable_query_plan: true
pg_stat_monitor.pgsm_histogram_max: 30000
pg_stat_monitor.pgsm_histogram_min: 50
pg_stat_monitor.pgsm_max: 512
pg_stat_monitor.pgsm_max_buckets: 20
pg_stat_monitor.pgsm_normalized_query: true
pg_stat_monitor.pgsm_query_shared_buffer: 384
pg_stat_monitor.pgsm_track_application_names: false
pg_stat_monitor.pgsm_track_planning: false

Under high load on PostgreSQL, when QPS exceeds 10K, errors occur:
caller=postgres_exporter.go:770 level=error err=“Error opening connection to database (postgres://sa_percona_monitoring:PASSWORD_REMOVED@localhost:5432/base_1?connect_timeout=1&sslmode=disable): read tcp 127.0.0.1:34926->127.0.0.1:5432: i/o timeout”" agentID=25836318-331d-4630-9cf8-7b739b9c5f4f component=agent-process type=postgres_exporter
caller=postgres_exporter.go:770 level=error err=“Error opening connection to database (postgres://sa_percona_monitoring:PASSWORD_REMOVED@localhost:5432/base_2?connect_timeout=1&sslmode=disable): read tcp 127.0.0.1:34944->127.0.0.1:5432: i/o timeout”" agentID=25836318-331d-4630-9cf8-7b739b9c5f4f component=agent-process type=postgres_exporter

Additionally, there are gaps in the graphs. PostgreSQL metrics are missing in PMM. At the moment when the metrics from pg_stat_monitor disappear, work with PostgreSQL continues - the services send requests and receive responses from the database.

The following messages are appearing in the PostgreSQL logs:
LOG: temporary file: path “base/pgsql_tmp/pgsql_tmp3145589.7”, size 314170608
STATEMENT: SELECT /* agent=‘pgstatmonitor’ */ “pg_stat_monitor”.“bucket”, “pg_stat_monitor”.“client_ip”, “pg_stat_monitor”.“query”, “pg_stat_monitor”.“calls”, “pg_stat_monitor”.“shared_blks_hit”, “pg_stat_monitor”.“shared_blks_read”, “pg_stat_monitor”.“shared_blks_dirtied”, “pg_stat_monitor”.“shared_blks_written”, “pg_stat_monitor”.“local_blks_hit”, “pg_stat_monitor”.“local_blks_read”, “pg_stat_monitor”.“local_blks_dirtied”, “pg_stat_monitor”.“local_blks_written”, “pg_stat_monitor”.“temp_blks_read”, “pg_stat_monitor”.“temp_blks_written”, “pg_stat_monitor”.“resp_calls”, “pg_stat_monitor”.“cpu_user_time”, “pg_stat_monitor”.“cpu_sys_time”, “pg_stat_monitor”.“rows”, “pg_stat_monitor”.“relations”, “pg_stat_monitor”.“datname”, “pg_stat_monitor”.“userid”, “pg_stat_monitor”.“top_queryid”, “pg_stat_monitor”.“planid”, “pg_stat_monitor”.“query_plan”, “pg_stat_monitor”.“top_query”, “pg_stat_monitor”.“application_name”, “pg_stat_monitor”.“cmd_type”, “pg_stat_monitor”.“cmd_type_text”, “pg_stat_monitor”.“elevel”, “pg_stat_monitor”.“sqlcode”, “pg_stat_monitor”.“message”, “pg_stat_monitor”.“pgsm_query_id”, “pg_stat_monitor”.“dbid”, “pg_stat_monitor”.“blk_read_time”, “pg_stat_monitor”.“blk_write_time”, “pg_stat_monitor”.“total_exec_time”, “pg_stat_monitor”.“min_exec_time”, “pg_stat_monitor”.“max_exec_time”, “pg_stat_monitor”.“mean_exec_time”, “pg_stat_monitor”.“stddev_exec_time”, “pg_stat_monitor”.“total_plan_time”, “pg_stat_monitor”.“min_plan_time”, “pg_stat_monitor”.“max_plan_time”, “pg_stat_monitor”.“mean_plan_time”, “pg_stat_monitor”.“wal_records”, “pg_stat_monitor”.“wal_fpi”, “pg_stat_monitor”.“wal_bytes”, “pg_stat_monitor”.“plans”, “pg_stat_monitor”.“comments”, “pg_stat_monitor”.“bucket_start_time”, “pg_stat_monitor”.“username” FROM “pg_stat_monitor” WHERE queryid IS NOT NULL AND query IS NOT NULL AND bucket_done AND pgsm_query_id IS NOT NULL
WARNING: [pg_stat_monitor] pgsm_store: Hash table is out of memory and can no longer store queries!
DETAIL: You may reset the view or when the buckets are deallocated, pg_stat_monitor will resume saving queries. Alternatively, try increasing the value of pg_stat_monitor.pgsm_max.
WARNING: [pg_stat_monitor] pg_stat_monitor_internal: Hash table is out of memory and can no longer store queries!
DETAIL: You may reset the view or when the buckets are deallocated, pg_stat_monitor will resume saving queries. Alternatively, try increasing the value of pg_stat_monitor.pgsm_max.

ANDiv1976 · November 5, 2025, 1:49pm

My issue is very similar to this one: https://perconadev.atlassian.net/browse/PMM-8646

How can I increase the timeout for getting a response from PostgreSQL for postgres_exporter?

ANDiv1976 · November 17, 2025, 1:54pm

Hello!

The question remains open, and resolving this issue is a priority for us.

Why does a “timed I/O” error occur in the postgres_exporter component under high PostgreSQL load? At the same time, all other metrics from the host are being received normally.

Log:

pmm-agent[2113148]: time="2025-11-17T11:50:38.255+00:00" level=info msg="Sending 2644 buckets." agentID=ae04d56c-6ec5-4c48-80da-cb8951b8eb7c component=agent-builtin type=qan_postgresql_pgstatmonitor_agent

pmm-agent[2113148]: time="2025-11-17T11:50:39.820+00:00" level=error msg="ts=2025-11-17T11:50:39.820Z caller=datasource.go:107 
level=error msg=\"Error opening connection to database\" dsn=\"postgres://sa_percona_monitoring:PASSWORD_REMOVED@server:5432/postgres?connect_timeout=1&sslmode=disable\" 
err=\"read tcp host:59514->host:5432: i/o timeout\"" agentID=85953d85-a453-4e9b-a67d-241144e64b6d component=agent-process type=postgres_exporter

pmm-agent[2113148]: time="2025-11-17T11:50:41.822+00:00" level=error msg="ts=2025-11-17T11:50:41.822Z caller=postgres_exporter.go:770 
level=error err=\"Error opening connection to database (postgres://sa_percona_monitoring:PASSWORD_REMOVED@server:5432/postgres?connect_timeout=1&sslmode=disable): 
read tcp host:59522->host:5432: i/o timeout\"" agentID=85953d85-a453-4e9b-a67d-241144e64b6d component=agent-process type=postgres_exporter

Under high load, when the issue with postgres_exporter occurs, other connections and query executions proceed normally. No other errors are observed on PostgreSQL itself. The only error log is in the pmm-agent, which I provided you earlier.

Topic		Replies	Views
PostgreSQL + PMM - memory usage problem PostgreSQL	1	1097	July 28, 2022
Pmm-client making too many connections to the postgres database PMM 2.x pmm , postgres	7	2536	January 24, 2023
PMM v3 postgres performance with timescaledb PMM 3.x	4	199	June 12, 2025
Error in percona logs - pg_stat_monitor: pg: out of memory PostgreSQL	2	307	November 21, 2025
Postgresql Query Drill Down Dashboard metrics - pg_stat_statements_time milliseconds is not from postgres_exporter PMM 2.x	3	1578	January 20, 2022

Errors in collecting metrics under high database load

Related topics