PMM v3 postgres performance with timescaledb

Hi, I’m running the pmm-agent on both my prod and dev databases and noticing really high CPU when installing the postgres service (node monitoring isn’t causing any issues) on my smaller dev instances (using t3a.medium on AWS). Without the PMM postgres service installed CPU is not going above 5% but with the standard metrics resolution, we were seeing the CPU max out consistently and the database crashing. We have managed to stabilise the database by having a custom metrics resolution with less frequent polling and I have reimplemented the PostgreSQL services in PMM to use pg_stat_statements instead of pg_stat_monitor although I’m not sure this last step made much of a difference.

Looking at the queries that were causing the high CPU, it is mainly this one here:

SELECT
current_database() datname,
schemaname,
relname,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_tup_hot_upd,
n_live_tup,
n_dead_tup,
n_mod_since_analyze,
COALESCE(last_vacuum, $1) as last_vacuum,
COALESCE(last_autovacuum, $2) as last_autovacuum,
COALESCE(last_analyze, $3) as last_analyze,
COALESCE(last_autoanalyze, $4) as last_autoanalyze,
vacuum_count,
autovacuum_count,
analyze_count,
autoanalyze_count,
pg_total_relation_size(relid) as total_size
FROM
pg_stat_user_tables

I can see that the query above is taking on average 1.87 seconds!

Because we are using timsecaledb the number of rows returned is > 6000, the large majority (i.e 99%) of which are timescaledb’s partitioned chunk tables for our older data.

As a new user to PMM, what are our options here in terms of either fine tuning the query or disabling this particular query entirely?

Hi, it’s happening because you have a lot of user tables. You can disable this query by removing it from /usr/local/percona/pmm/collectors/custom-queries/postgresql/low-resolution/queries-lr.yml on client instance.

Thanks for the tip. I have removed that particular query from the queries-lr.yml on the client instance, restarted the pmm-agent and even recreated the postgres service but that query still keeps running regardless. There are no pmm-agent journal logs that suggest to me that something is wrong either.

What is the process for ensuring the latest changes are read from the custom query files?

oh, sorry, I found another place where we could collect these metrics. Please try to re-add postgresql service with flag --disable-collectors=stat_user_tables

Thanks that did the trick!