SQL query optimalization for N:M relation

Dear all,

I have a problem with one SQL query. I want to list all records from one table (in example bellow: “table1”) matching some conditions - which are stored in “table2”.

The structure is attached to this post:

The problematic query follows:
SELECT DISTINCT table1.*
FROM table1
INNER JOIN resttyperest ON table2.table1_id=table1.id
WHERE table2.table3_id IN (6, 10, 15)
ORDER BY restaurant.name;

This query uses temporary table and also filesort.
So I tried to rewrite it to:

SELECT table1.*
FROM table1
WHERE
(SELECT id FROM table2
WHERE table2.table3_id IN (6, 10, 15)
AND table2.table1_id=table1.id
) IS NOT NULL
ORDER BY table1.name

This query uses only filesort and also runs faster (10-times) ).

  1. Are there any cons if I decide to use the second query with subqueries (on MySQL 5.0.x)?

  2. Do you have any ideas how to improve the second query?

  3. Do you use similar table-structure, if you have records (table1), types (table3) and their relation is N:M?

  4. Any other suggestions, notes? )

Thanks…

A suggestion is that you post the create table statements also. That way we can all easily create the same tables and we can also directly see which indexes you have created.
And if you want to rewrite your query before posting it, you should check better for spelling errors etc before posting it.

But my suggestions are:
1.
Do you actually need DISTINCT?
Isn’t the records unique withing table1 already?

Your original query should look like this:

SELECT table1.*FROM table1INNER JOIN resttyperest ON ( table2.table1_id=table1.id AND table2.table3_id IN (6, 10, 15) )ORDER BY restaurant.name

And then I suggest index:
table2(table3_id, table1_id)
which combined with the primary key id for table1 should find the records very fast. It would still need the filesort.

4
But possibly with index:
table1(name, id))
you should be able to avoid the filesort also.

Hi Sterin…

the correct SQL queries follows:

SELECT DISTINCT table1.*FROM table1 INNER JOIN table2 ON (table2.table1_id=table1.id)WHERE table2.table3_id IN (6, 10, 15)ORDER BY table1.name

The query was rewritten to:

SELECT table1.*FROM table1WHERE (SELECT COUNT(id) FROM table2 WHERE table2.table3_id IN (6, 10, 15) AND table2.table1_id=table1.id ) > 0ORDER BY table1.name

  1. Yes, I need DISTINCT in the first query. Why? Because one record from table1 can be “joined” with many rows in “table2”. But I want to get a row from table1 only once (and only if it fulfills conditions from table2). Without DISTINCT I get:
    id | name
    1 | Name1
    1 | Name1
    1 | Name1
    5 | Name5
    5 | Name5

  2. I tried to move that condition from WHERE clause to ON clause, but it seems to be considered as the same query in MySQL (time spent on this query is equal and EXPLAIN gives the same result)

  3. Currently I have the indices on:

a) table1(id) b) table1(name) c) table2(table1_id) d) table2(table3_id) e) table2(id, table1_id, table3_id) f) table3(id)

Do you think, that I should add table2(table3_id, table1_id)? Maybe it is in a duplicity to indices c) and d).

  1. Yes, the same question: I already have table1(id) and table1(name). Could your index improve performance of that query?