Hey Guys,
I’m in the middle of designing a custom Mysql Query Analyzer to use on our own servers, and once/if I get it stable enough I’ll be releasing it open source also.
I have done a simple workbench mockup of the design of the script, and I would like some advice on what I should watch out for, and changes that should/could be made to make things more efficient.
Explanation:
The “queries” table is the primary table in the whole diagram, the table is primarily a “mapper” between itself and the databases, query_binds and query_templates tables, which are explained a little more further down.
Tables Explained:
-
query_templates - This table is designed to store a query in a format with placeholders that match up to binds, for ex:
SELECT * FROM table WHERE column = ‘?’ where the ? is stored seperately from the query so they can be normalised (and aggregated). -
query_binds - This table is designed to store all data for the query templates, so a query_template and a query_bind make up a query that can be ran
-
queries - This table stores the every queries execution details, also acts as the primary mapper table to fit together the query_templates, query_binds and databases tables.
The rest of the tables should be fairly self explanitory.
The Problem:
As you probably already know the general query log (our initial source of query data) fills up amazingly quickly, and we will quickly have hundreds of millions of records in our queries table. My concern is, MySQL will be unable to handle this with the way I have designed the tables. I however do not know how to design this any better than the normalised state you see it in.
A example query (off the top of my head) may look like:
SELECT sum(q.exec_time) as total_exec_time, sum(q.num_rows) as total_num_rows, sum(q.exec_time) / count(q.exec_time) as avg_exec_time, sum(q.num_rows) / count (q.num_rows) as avg_num_rows, qtemp.query, qtype.name as query_type, databases.name, MAX(q.timestamp) as latest_run, MIN(q.timestamp) as first_runFROM queries as qLEFT JOIN query_templates as qtemp ON q.query_template_id = qtemp.idLEFT JOIN query_type as qtype ON qtemp.query_type_id = qtype.idLEFT JOIN databases as dbs ON dbs.id = q.database_idWHERE dbs.name LIKE ‘%example%’ AND qtype.name LIKE '%example2%'GROUP BY q.query_template_idORDER BY q.timestamp DESC
As you can see, it is important that the databases are all called within the same statement as WHERE and ORDER BY statements will be used as per the users request on multiple different bits of information.
Any constructive feedback is welcome, this is only on the drawing board at the moment and I won’t be starting development until another project has been completed, so I have plenty of time to alter things whereever neccesary. I’m guessing partitions, or seperate tables would be a good solution, but I wouldn’t know where to start.
Also, if anyone from Percona reads this, I’ve bought your book a while back but haven’t had a chance to get to reading it yet, if this sort of thing is covered in it would you be able to point me to a relevant location?