performance issue

Hi,

The following query is taking 3 mins to execute and table has 30 million rows…

SELECT host_ip “Users”, SUM(events) “Connections”, TRUNCATE((SUM(bytes) * 1.0000), 2) “Cost (INR)”,
SUM(bytes) “MBytes” FROM vp_top_summary
WHERE firewall_report_id IN (1118,1142) AND host_id = -1 GROUP BY host_ip ORDER BY “MBytes” DESC LIMIT 20;

index information is as follows
mysql> show index from vp_top_summary;
±---------------±-----------±---------±-------------±-- -----------------±-
---------±------------±---------±-------±-----±-------- —±--------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | C
ollation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
±---------------±-----------±---------±-------------±-- -----------------±-
---------±------------±---------±-------±-----±-------- —±--------+
| vp_top_summary | 1 | vpts_ihf | 1 | host_ip | A
| 52290909 | NULL | NULL | YES | BTREE | |
| vp_top_summary | 1 | vpts_ihf | 2 | host_name | A
| 52290909 | NULL | NULL | YES | BTREE | |
| vp_top_summary | 1 | vpts_ihf | 3 | firewall_report_id | A
| 52290909 | NULL | NULL | YES | BTREE | |
| vp_top_summary | 1 | vpts_f | 1 | firewall_report_id | A
| 17 | NULL | NULL | YES | BTREE | |
| vp_top_summary | 1 | vpts_i | 1 | id | A
| 52290909 | NULL | NULL | | BTREE | |
±---------------±-----------±---------±-------------±-- -----------------±-
---------±------------±---------±-------±-----±-------- —±--------+
5 rows in set (0.45 sec)

any suggestions or ideas will be appreciated…

thanks in advance…

You could check what explain select … say.
Maybe You should add index to the host_id column ?

Hi…

Thank you for your ideas.

explain of the query is as follows

±—±------------±---------------±------±-------------- ±-------±--------+
------±--------±-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
±—±------------±---------------±------±-------------- ±-------±--------+
------±--------±-----------------------------+
| 1 | SIMPLE | vp_top_summary | range | vpts_f | vpts_f | 5 |
NULL | 1607657 | Using where; Using temporary |
±-
–±------------±---------------±------±-------------
-±-------±--------+

Now i have removed the condition "AND host_id = -1 " from the query, so index is not required.

will the ordering of the index column help?
as the table has 50 million rows now… changing index order will take a very long time. unless am sure it makes difference do not want to take a step on that.

please keep posting your ideas or suggestions if you have any.

thank you so much!

It would scan 1607657 rows (explain sometimes displays very unprecise values).

[B]Quote:[/B]

#1 select count() as rows FROM vp_top_summary
WHERE firewall_report_id IN (1118,1142) GROUP BY host_ip
#2 select count(
) as rows FROM vp_top_summary
WHERE firewall_report_id IN (1118,1142)

#1 query (I hope I didn't make any mistake in it) will show how many rows would be received. #2 query will show how many rows mysql have to fetch before grouping. Compare it with that 1.6M from explain. Is it smaller ?

I can see only one way to solve the problem. This one table can stay, but You should add second one with single row per firewall_report_id and host_ip ({firewall_report_id,host_ip} UNIQUE). The cron could run updating script every minute. After that You will get less rows without grouping. The problem is still with that order by. I have to think about it. Right now I have got no idea how to deal with it.

I hope this reply isn’t too late.

Have you tried:

SELECT *FROM ( SELECT host_ip “Users”, SUM(events) “Connections”, TRUNCATE((SUM(bytes) * 1.0000), 2) “Cost (INR)”, SUM(bytes) “MBytes” FROM vp_top_summary WHERE firewall_report_id IN (1118,1142) AND host_id = -1 GROUP BY host_ip)ORDER BY “MBytes” DESC LIMIT 20;

with another index on (firewall_report_id, host_ip)?