Hi everyone
I’m relatively new to MySQL and gradually learning new stuff, but I’m having a problem with a particular query. I’ve asked around in the PHP DevNet forums and it seems I should be getting much better performance on this query, as my tables are pretty small and the query’s not particularly advanced.
The structure is nicely normalised, something I’ve wanted to have for years. I’m gradually switching our company intranet subscribers database from filemaker to MySQL.
My server is Ubuntu Server 8.04 LTS, MySQL5, PHP5 and Apache2. On a pretty recent Intel Core 2 server with 4GB memory.
I’d really appreciate it if someone could give me some advice on this, as it’s been driving me crazy!
contacts has about 36,000 records
companies: 38,000
tags_data: 42,000
tags: 980
My total DB size is about 26MiB, so it’s tiny in comparison to many MySQL databases. I think this query should not be taking 32 seconds to complete!
Basically the tags table has various tags/classifications which can be applied to companies. So you can tag a company as MANUFACTURER, DISTRIBUTOR, SHOW EVENT ORGANISER… etc and there’s 980 different tags/classifications to choose from. Each company can have multiple tags applied, and the tags_data table stores which tags are applied to which company.
Here’s a quick diagram of part of the database which should help to visualise
green bold are the primary key columns. Red are the columns which link the data together.
Here’s the query I’m trying to run:
SELECT contacts
.contact_name
, companies
.company_name
, GROUP_CONCAT(tags
.name
ORDER BY tags
.name
ASC SEPARATOR ‘,’) AS classifications
FROM contacts
LEFT JOIN companies
USING(company_id
)LEFT JOIN tags_data
ON companies
.company_id
=tags_data
.company_id
LEFT JOIN tags
ON tags_data
.tag_id
=tags
.tag_id
WHERE contacts
.subscribed
='VIP’GROUP BY contacts
.contact_id
The syntax on that might be improved I guess, but I don’t know how to do that. That’s the limit of my MySQL knowledge really.
I’m trying to get contact_name, company_name and a column called classifications which is a comma separated list of the tags/classifications names that are selected for that particular company.
contacts.subscribed can have 3 different values: NULL, FREE or VIP
This query only returns 777 records, but I’ll be looking to return up to 20,000 with a variation of this query if I can get it running faster!
Initially this query took about 80 seconds, but I added a composite index (only just learnt about them) on the tags_data table (index called tags_companies) which reduced it to about 30 seconds.
I removed the ORDER BY or LIMIT from the query earlier as well, but it made very little difference.
Here’s the output of EXPLAIN on this query:
id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE contacts ref IX_subscribed IX_subscribed 15 const 1103 Using where; Using temporary; Using filesort1 SIMPLE companies eq_ref PRIMARY PRIMARY 4 dwcintranet.contacts.company_id 1 1 SIMPLE tags_data index company_id tags_companies 26 NULL 41930 Using index1 SIMPLE tags eq_ref PRIMARY PRIMARY 4 dwcintranet.tags_data.tag_id 1
So I think the problem is the Using filesort. I’ve tried taking the query apart to eliminate the filesort, and it seems it’s the 2nd join that’s causing the filesort: tags_data ON companies
But I don’t know what else I could index to sort this out
Any ideas/suggestions?
I would really appreciate some help on this, I’m convinced that this query shouldn’t be taking 30 seconds!
Thanks, Ben