I’m trying to optimize a query that often appears in the slow query log. Some background info: we have a site where we publish news articles.
Every article (nieuwsbericht) is linked to one country (land), one category (categorie) and several sectors, of which one is marked as ‘head’ sector.
Countries are linked to regions (regio), which are linked to a continent (werelddeel). This query is on a page which displays articles in a selected
region or continent. So the query contains the following restrictions:
- articles should be linked to a country in a particular contintent or region
- articles with a certain sector as ‘head’ sector (gerelateerd = 1) should not be retrieved
- articles should have the ‘published’ (gepubliceerd) status
- articles will be displayed in chronological order, starting with the latest, so the query is ordered by date (datum_public) and time (tijd_public).
The query:
SELECT n.nieuwsbericht_id, n.datum_public, n.tijd_public, n.titel, n.ondertitel, n.bericht, n.bericht_afkappen, n.afbeelding, n.afb_plaatsing, n.afb_breedte, n.afb_hoogte, n.link, n.aantal_reacties, c.omschrijving AS categorie, r.naam_en AS regio, w.naam_en AS werelddeel FROM regio r INNER JOIN land l ON (l.regio_id = r.regio_id) INNER JOIN werelddeel w ON (w.werelddeel_id = r.werelddeel_id) INNER JOIN nieuwsbericht n ON (n.land_id = l.land_id) INNER JOIN categorie c ON (c.categorie_id = n.categorie_id) INNER JOIN nieuwsbericht_has_sector nhs ON (nhs.nieuwsbericht_id = n.nieuwsbericht_id) WHERE r.regio_id = 22 AND nhs.sector_id < 50 AND nhs.gerelateerd = 1 AND n.gepubliceerd = 1 ORDER BY n.datum_public DESC, n.tijd_public DESC LIMIT 14;
The explain output:
±—±------------±------±-------±----------------------------------------------------------------------------±-----------------------±--------±-------------------------------------±-----±--------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-------±----------------------------------------------------------------------------±-----------------------±--------±-------------------------------------±-----±--------------------------------+| 1 | SIMPLE | r | const | PRIMARY,fkindex_werelddeel | PRIMARY | 1 | const | 1 | Using temporary; Using filesort | | 1 | SIMPLE | w | const | PRIMARY | PRIMARY | 1 | const | 1 | | | 1 | SIMPLE | l | ref | PRIMARY,fkindex_regio | fkindex_regio | 1 | const | 9 | Using index | | 1 | SIMPLE | n | ref | PRIMARY,fkindex_categorie,fkindex_land | fkindex_land | 6 | db.l.land_id | 59 | Using where | | 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 1 | db.n.categorie_id | 1 | | | 1 | SIMPLE | nhs | ref | PRIMARY,fkindex_nieuwsbericht,fkindex_sector | PRIMARY | 3 | db.n.nieuwsbericht_id | 1 | Using where | ±—±------------±------±-------±----------------------------------------------------------------------------±-----------------------±--------±-------------------------------------±-----±--------------------------------+
First thing I was thinking of was to make the query work without a filesort and temporary table. Can this be achieved by adding indices or modifying the
query? Are there any other ways in which the query can be optimized?
Or should I look for optimization on application or schema design level? For example by dropping some of the restrictions in the query and filter the
records in the application. Or, for example, have the region and continent id’s as duplicated rows in the news article table.