Query optimization

MySQL 5.0 with all tables using InnoDB.

Our application has been “live” for approximately 1 year. I have been responsible for writing most of the queries - especially for the big, convoluted business reports - but not for creating or managing the database. Several years worth of historical data was imported before going live so many of the tables had 50-100 thousand records on day 1. I wrote and optimized the queries to use the available indexes and referenced the EXPLAIN output regularly. All but a few queries run in well under 1 second and the rest are massive but still return in less than 5 seconds.

Well, they did. Recently a few of the reports have been running incredibly slow - like 2-3 minutes! Running the SQL statements directly against the database reveals that the bottleneck is not in the application but in the queries themselves. Checking EXPLAIN reveals that the optimizer appears to “flip” the order of the tables (for lack of a better explanation). For example, a particular table contains the department/division and the date of the transaction. The “filters” in the WHERE all apply to this table it used to be at or near the beginning of the query plan. The results are ordered by a column in a different table however (a lookup table - not details). Now, this first table shows up at the end and the order by table is near the top which would seem to me that the query ends up gathering the cartesian product of of all the related data then weeds it down to the small set that was asked for.

This latest query is literally all INNER JOINs but in two previous instances the slow queries used subqueries in the FROM clause. It seemed like those were losing an index on the inner query but I don’t see how this latest one would have that exact issue. I can currently make them all faster by using the STRAIGHT_JOIN keyword but I know that is not an optimal solution (is it?). Out of curiosity I ran an OPTIMIZE TABLE (for the particular table referenced in the WHERE clause) and it fixed the query plan on one server but not in three others with an identical setup (one on the same machine).

I found this site through a Google search on performance optimization and saw that others have experienced similar results with the optimizer picking an inferior plan. I didn’t really see a solution other than using index hints (or the previously mentioned STRAIGHT_JOIN). Is there a definitive answer on why MySQL would choose a different query plan? Will hints help or can MySQL just choose to ignore them one day and I’ll have to start all over?

Thanks,
Troy

I don’t really have enough information to comment on your specific case, but in the general case, it’s all decided by statistics on the data distribution. These are just estimated on InnoDB. Even on other engines where they are not estimates, the optimizer doesn’t always know the true cost of a query, leading to things like this:

http://www.mysqlperformanceblog.com/2008/04/28/the-mysql-opt imizer-the-os-cache-and-sequential-versus-random-io/

Hey, thanks! That’s almost exactly how mine looked when it ran fast (basically 1 table scan of 50k rows and a bunch of joins with 1 row each). When it was slow most of the joined tables had a few rows until it got to the largest table.

So is STRAIGHT_JOIN an accepted method for dealing with these circumstances? I had read elsewhere that it should be avoided as it keeps the optimizer from handling changes in the data (more rows, different/fragmented indexes, etc.). However, if I’m going to run into random performance problems I think I can deal with setting my join order in stone.

Troy

It’s acceptable for me when the optimizer’s heuristics aren’t giving a good query plan! I don’t like to use plan hints, but if you know that there’s only one good plan, by all means hint it.

If you use a tool like mk-upgrade, you might fend off regressions in the event that the optimizer chooses a worse plan with the hint in the future.