What’s the problem?
we found that the field(schema) in most of records in the table(pmm.metrics) in Clickhouse is empty, which will affect when we want to filter for slow queries in a specific database.
from this article, we knew that pmm2 collect details of slow queries by analyzing MySQL slow query log, and we found the implementation for MySQL slow query log analyzing tool in percona/pmm-agent.
// file:pmm-agent/agents/mysql/slowlog/parser/parser.go
// line: 110
func (p *SlowLogParser) Run() {
//...
}
After observing the implementation of the source code of pmm-agent, we understand that the reason. The pmm-agent set field(schema) for queries that include declaration of database(use xxx;) in the slow query log.
We believe that pmm-agent will leave blanks for queries that are not clearly marked for the sake of rigor. However, we think that after a certain reasoning, we can actually get the database (or potential database) corresponding to the query.
For example
We make an example (part from percona/pmm-agent) as follow:
# User@Host: bookblogs[bookblogs] @ localhost [] Id: 56601
# Query_time: 0.321092 Lock_time: 0.000038 Rows_sent: 0 Rows_examined: 0
use dbnamea;
SET timestamp=1415210700;
SELECT field FROM table_a WHERE some_other_field = 'yahoo' LIMIT 1;
# User@Host: bookblogs[bookblogs] @ localhost [] Id: 56601
# Query_time: 0.000297 Lock_time: 0.000141 Rows_sent: 1 Rows_examined: 1
SET timestamp=1415210700;
SELECT field FROM table_b WHERE another_field = 'bazinga' AND site_id = 1;
# User@Host: backup[backup] @ localhost [] Id: 56458
# Query_time: 0.000558 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
SET timestamp=1415210700;
use `dbnameb`;
# User@Host: bookblogs[bookblogs] @ localhost [] Id: 56601
# Query_time: 0.000164 Lock_time: 0.000059 Rows_sent: 1 Rows_examined: 1
SET timestamp=1415210700;
SELECT another_field FROM table_c WHERE a_third_field = 'tiruriru' AND site_id = 1;
# User@Host: bookblogs[bookblogs] @ localhost [] Id: 56601
# Query_time: 0.000297 Lock_time: 0.000141 Rows_sent: 1 Rows_examined: 1
SET timestamp=1415210700;
SELECT field FROM dbnamec.table_d JOIN dbnamed.table_e ON dbnamec.table_d.site_id = dbnamed.table_e.site_id;
and the parse result of pmm-agent will be like as follow:
sql | schema |
---|---|
use dbnamea; SELECT field FROM table_a WHERE some_other_field = ‘yahoo’ LIMIT 1; | dbnamea |
SELECT field FROM table_b WHERE another_field = ‘bazinga’ AND site_id = 1; | |
use dbnameb; | dbnameb |
SELECT another_field FROM table_c WHERE a_third_field = ‘tiruriru’ AND site_id = 1; | |
SELECT field FROM dbnamec.table_d JOIN dbnamed.table_e ON dbnamec.table_d.site_id = dbnamed.table_e.site_id; |
after observing the logs we can complete rest of them:
sql | schema |
---|---|
use dbnamea; SELECT field FROM table_a WHERE some_other_field = ‘yahoo’ LIMIT 1; | dbnamea |
SELECT field FROM table_b WHERE another_field = ‘bazinga’ AND site_id = 1; | dbnamea |
use dbnameb; | dbnameb |
SELECT another_field FROM table_c WHERE a_third_field = ‘tiruriru’ AND site_id = 1; | dbnameb |
SELECT field FROM dbnamec.table_d JOIN dbnamed.table_e ON dbnamec.table_d.site_id = dbnamed.table_e.site_id; | dbnamec, dbnamed |
Maybe some suggestion
We design a logic for database name completion
- if
use xxx;
occurred;- record the database name
- when we found query, we parse the sql first:
- If there is a table in the statement that does not indicate the corresponding database, we will assign the database record in step-1.1 name to this table
- If there is a table in the statement that indicates the corresponding database, we directly use this database
However, the above logic will have such a problem, there may be more than one database name in a record. Can this problem be solved by adding new fields to the table(pmm.metric) ? Or is there any better way to solve this problem?