Questionable QAN Statistics in PMM2.30


I’m running PMM 2.30 and struggling to make sense of what QAN is collecting; to be more precise QAN “Query count” seems to be collecting some completely incoherent stats along with valid ones.

In a test to make sense of the stats I chose a particular host, then a specific query digest and selected the count_star figure from the events_statements_summary_by_digest table, wait 10 minutes then select the figure again, by my reckoning the difference should at least in some way correlate to the Query count in QAN for that time period. However here is the gist of what I see in QAN for the duration between the same timestamps for 3 sample queries:

Query 1: count_star increment = 29826 VS QAN 49.71 QPS / “sum 29.83k”
Query 2: count_star increment = 5963 VS QAN 9.91 QPS / “sum 5.95k”
Query 3: count_star increment = 3368 VS QAN 14.79k QPS / “sum 8.87m”

The first two sums tally, but the 3rd is complete nonsense. By my calculation this 3rd digest is about 5.61 qps.

In my past contact with grafana I recall having sudden jumps due to counter stats having data from “old incarnations” of the same host, and I wondered if this could be happening here; indeed this host has been destroyed and re-incarnated a few times over the past few months, but not in recent weeks.

Following that theory I tried setting DATA_RETENTION on the pmm-server to 24h, which clears out the old QAN stats, but the issue persists.

Can anyone explain what might be going off, or where I can dig down to see some raw data within pmm? I recall writing promql to debug grafana issues in the past but my memory is alas a blank on how - could anyone humour me and tell me how to dig into this issue?

Many thanks

1 Like

So it looks like I’ve got a situation where QAN is getting an inflated figure. Here’s what I did to compare.

Ran the following on the minute on the database host until I saw the uplifted figure in QAN:
SELECT current_timestamp, digest_text, count_star FROM performance_schema.events_statements_summary_by_digest WHERE digest = 'bdf11382a52f1c39d113a784524af465';

This logged:

|    2022-10-05 09:13:57|SHOW SLAVE STATUS |41379212|
|    2022-10-05 09:14:57|SHOW SLAVE STATUS |41380265|
|    2022-10-05 09:15:57|SHOW SLAVE STATUS |41381319|
|    2022-10-05 09:16:57|SHOW SLAVE STATUS |41382372|

Then on the pmm server within clickhouse:

select queryid, node_name, period_start, num_queries from metrics where period_start>'2022-10-05 09:14:00' and period_start<'2022-10-05 09:16:00' and queryid = 'bdf11382a52f1c39d113a784524af465' and node_name = '<database hostname here>';

│ bdf11382a52f1c39d113a784524af465 │ <database hostname> │ 2022-10-05 09:15:00 │    41381320 │