Hello, this is my first post )
I’m developing a log analysis application and I’m having a performance problem that I was hoping that someome could help (thanks in advance)
I have table QUERY that has almost 3 million rows. I also have a table Location containing rows for most cities of the world. The QUERY table has a LID (location id) that matches a specific location).
Although the complete query is not this simple, I’ve discovered where the performance problem arises.
The query is as simple as this:
SELECT QUERY, COUNT(*) AS FREQUENCY
FROM QUERY
where lid=2703
GROUP BY QUERY ORDER BY FREQUENCY DESC LIMIT 10
When I run this simple query with other lid values it is quite fast, but I discovered that a very large percentage of the queries has lid=2703 (over 1.8 million in almost 3 million). When I run the query, the results take over 200sec to be processed.
This is the table:
CREATE TABLE QUERY (
QID INT UNSIGNED NOT NULL, (query id)
SID INT UNSIGNED NOT NULL, (session id)
QUERY VARCHAR(255) NOT NULL,
LID INT UNSIGNED, (location id)
PRIMARY KEY(QID)
)ENGINE = MYISAM;
SHOW INDEXES FROM QUERY
table |Non_unique |key_name |seq_in_index |Column_name |Collation |Cardinality |Sub_part |Packed |Null |Index_type
‘query’, 0, ‘PRIMARY’, 1, ‘QID’, ‘A’, 2387075, , ‘’, ‘’, ‘BTREE’, ‘’
‘query’, 1, ‘QUERYSID_NDX’, 1, ‘SID’, ‘A’, 596768, , ‘’, ‘’, ‘BTREE’, ‘’
‘query’, 1, ‘QUERYLID_NDX’, 1, ‘LID’, ‘A’, 132615, , ‘’, ‘YES’, ‘BTREE’, ‘’
‘query’, 1, ‘QUERYQ_NDX’, 1, ‘QUERY’, ‘A’, 341010, , ‘’, ‘’, ‘BTREE’, ‘’
‘query’, 1, ‘QUERY_NDX_FULL’, 1, ‘QUERY’, ‘’, 140416, , ‘’, ‘’, ‘FULLTEXT’, ‘’
(At first I had only the full index for column query, then I added the “normal” B-tree index)
-----EXPLAIN
1, ‘SIMPLE’, ‘QUERY’, ‘index’, ‘QUERYLID_NDX’, ‘QUERYQ_NDX’, ‘767’, ‘’, 89, ‘Using where; Using temporary; Using filesort’
How can I display the top 10 queries for some location OR many locations, with good performance?
Could you please help?
I know the word “urgent” is very used by this is the case…
Thx