Hi, I’m running the pmm-agent on both my prod and dev databases and noticing really high CPU when installing the postgres service (node monitoring isn’t causing any issues) on my smaller dev instances (using t3a.medium on AWS). Without the PMM postgres service installed CPU is not going above 5% but with the standard metrics resolution, we were seeing the CPU max out consistently and the database crashing. We have managed to stabilise the database by having a custom metrics resolution with less frequent polling and I have reimplemented the PostgreSQL services in PMM to use pg_stat_statements instead of pg_stat_monitor although I’m not sure this last step made much of a difference.
Looking at the queries that were causing the high CPU, it is mainly this one here:
SELECT
current_database() datname,
schemaname,
relname,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_tup_hot_upd,
n_live_tup,
n_dead_tup,
n_mod_since_analyze,
COALESCE(last_vacuum, $1) as last_vacuum,
COALESCE(last_autovacuum, $2) as last_autovacuum,
COALESCE(last_analyze, $3) as last_analyze,
COALESCE(last_autoanalyze, $4) as last_autoanalyze,
vacuum_count,
autovacuum_count,
analyze_count,
autoanalyze_count,
pg_total_relation_size(relid) as total_size
FROM
pg_stat_user_tables
I can see that the query above is taking on average 1.87 seconds!
Because we are using timsecaledb the number of rows returned is > 6000, the large majority (i.e 99%) of which are timescaledb’s partitioned chunk tables for our older data.
As a new user to PMM, what are our options here in terms of either fine tuning the query or disabling this particular query entirely?