PMM2 has a dashboard called RED Method for MySQL Queries - Designed for PMM2
One of the panels contains an indicator p99Latency, the query statement is
SELECT
$timeSeries as t,
sum(m_query_time_sum)/sum(m_query_time_cnt) as AvgLatency,
avg(m_query_time_p99) as p99Latency
FROM $table
WHERE $timeFilter and service_type='mysql' and labels.value=['$labels_env']
GROUP BY t
ORDER BY t
In QAN, if you click on a query statement, a clickhouse query statement will be run behind it
The actual statement running in clickhouse is easy to catch
SELECT
SUM(num_queries) AS num_queries,
SUM(num_queries_with_errors) AS num_queries_with_errors,
SUM(num_queries_with_warnings) AS num_queries_with_warnings,
SUM(m_query_time_cnt) AS m_query_time_cnt,
SUM(m_query_time_sum) AS m_query_time_sum,
MIN(m_query_time_min) AS m_query_time_min,
MAX(m_query_time_max) AS m_query_time_max,
AVG(m_query_time_p99) AS m_query_time_p99,
...
FROM metrics
WHERE (period_start >= 1648656000) AND (period_start <= 1648742399) AND (queryid = 'C782C3D2DA2585FF') AND (replication_set IN ('mha_rd_geass')) AND (schema IN ('broker')) AND (service_name IN ('bj1-mysql-broker-prod-01_3306'))
GROUP BY queryid
WITH TOTALS
Query id: 4ba24d70-b315-4ef8-8cb9-7a5bb7b244e1
Row 1:
ββββββ
num_queries: 825001
num_queries_with_errors: 0
num_queries_with_warnings: 0
m_query_time_cnt: 8251
m_query_time_sum: 1130.888653561473
m_query_time_min: 0.000635
m_query_time_max: 3.109454
m_query_time_p99: 0.0037712189824731703
...
The above results show that QAN(and RED Method for MySQL Queries Dashboard) uses avg(m_query_time_p99) to calculate query βquery time P99β (0.0037712189824731703)
But is this really the right way to calculate it? I think it might not be.
Here is what I think is the correct way to calculate βquery time P99β:
SELECT quantileExactWeighted(0.99)(m_query_time_p99, toUInt64(total_queries)) AS p99
FROM
(
SELECT
m_query_time_p99,
sum(num_queries) AS total_queries
FROM pmm.metrics
WHERE (period_start >= 1648656000) AND (period_start <= 1648742399) AND (queryid = 'C782C3D2DA2585FF') AND (replication_set IN ('mha_rd_geass')) AND (schema IN ('broker')) AND (service_name IN ('bj1-mysql-broker-prod-01_3306'))
GROUP BY m_query_time_p99
)
Query id: e4ed1258-35e8-4fad-baae-45bba11d8852
βββββββp99ββ
β 0.004027 β
ββββββββββββ
I also did some calculations in python
SELECT
m_query_time_p99,
sum(num_queries)
FROM pmm.metrics
WHERE (period_start >= 1648656000) AND (period_start <= 1648742399) AND (queryid = 'C782C3D2DA2585FF') AND (replication_set IN ('mha_rd_geass')) AND (schema IN ('broker')) AND (service_name IN ('bj1-mysql-broker-prod-01_3306'))
GROUP BY m_query_time_p99
Query id: 8bedf2a1-351b-424c-afe2-18fda812247f
ββm_query_time_p99ββ¬βsum(num_queries)ββ
β 0.001691 β 1400 β
β 0.001471 β 2900 β
β 0.005632 β 900 β
β 0.001472 β 3600 β
β 0.001762 β 800 β
β 0.001396 β 1900 β
β 0.002547 β 1000 β
β 0.001795 β 900 β
...
import numpy as np
s = ((0.001691, 1400)
,(0.001471, 2900)
,(0.005632, 900)
,(0.001472, 3600)
,(0.001762, 800)
,(0.001396, 1900)
,(0.002547, 1000)
,(0.001795, 900)
...
)
arr = list()
for i in s:
query_time = i[0]
count = i[1]
for _ in range(count):
arr.append(query_time)
a = np.array(arr)
print(np.percentile(a, 99))
0.004027
I hope someone can answer the question, which calculation is the correct one?