I’m trying to build a webapplication where users can search for a person having a particular preference for color and material. To store this information I use the following structure (a MySQL dump can be found at the end of this post):
*table person with fields:
-persid: autoincrement id
-name: name of the person
*table material with fields:
-materialid: autoincrement id
-material: name of the material eg “wood”
*table color with fields:
-colorid: autoincrement id
-color: name of the color eg “green”
*table persmaterial with fields:
-persmatid: autoincrement id
-persid: link to table person
-materialid: link to table material
*table perscolor with fields:
-perscolorid: autoincrement id
-persid: link to table person
-colorid: link to table color
In the webapplication the search can be entered by the users as a kind of pseudo query:
(color=red OR color=blue) AND color=green AND material=iron
My question is: how can I automatically transform this pseudo query into an efficient MySQL query?
I have tried out some different options:
Option 1:
(SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=1 OR pc.colorid=2) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1)) UNION
(SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=2 OR pc.colorid=3) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1))
Remarks:
*I do not see how to turn a general pseudo query into a query like the one in option 1, except for turning the pseudo query into a sum of products form where the sulms would correspond to the UNIONs. IS there a clever way to obtain such a sum of products form from an arbitrary pseudo query?
Option 2:
SELECT persid FROM person p WHERE
(EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=1 AND p.persid=pc.persid)
OR
EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=3 AND p.persid=pc.persid))
AND
EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=2 AND p.persid=pc.persid)
AND
EXISTS(SELECT * FROM persmaterial pm WHERE pm.materialid=2 AND p.persid=pm.persid)
Remarks:
*very easy to get from pseudo query to MySQL query but what about performance?
Option 3:
SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE
p.persid=pc.persid
AND
(pc.colorid=1 OR pc.colorid=2 OR pc.colorid=3)
AND p.persid=pm.persid
AND pm.materialid=2
GROUP BY p.persid HAVING
sum(case when pc.colorid in (‘1’,‘3’) then 1 else 0 end) >= 1
AND
sum(case when pc.colorid=‘2’ then 1 else 0 end)>=1
AND
sum(case when pm.materialid=‘2’ then 1 else 0 end)>=1
Remarks:
*this option requires the pseudo query to be turned into a product of sums form; again is their a clever way to obtain such a form;
Option 4
SELECT DISTINCT pc1.persid FROM perscolor pc1
INNER JOIN perscolor pc2
ON pc1.persid=pc2.persid AND pc2.colorid=2
INNER JOIN persmaterial pm1
ON pc1.persid=pm1.persid AND pm1.materialid=2
LEFT OUTER JOIN perscolor pc3
ON pc1.persid=pc3.persid AND pc3.colorid=1
LEFT OUTER JOIN perscolor pc4
ON pc1.persid=pc4.persid AND pc4.colorid=3
WHERE COALESCE(pc3.persid,pc4.persid) IS NOT NULL
Remarks:
*this option requires the pseudo query to be turned into a product of sums form
Option 5:
SELECT p.persid FROM person p, persmaterial pm,perscolor pc1,perscolor pc2,perscolor pc3 WHERE p.persid=pm.persid AND p.persid=pc1.persid AND p.persid=pc2.persid AND p.persid=pc3.persid AND (pc1.colorid=1 OR pc2.colorid=3) AND pc3.colorid=2 AND pm.materialid=2 GROUP BY p.persid
Remarks:
*very easy to get from pseudo query to MySQL query but what about performance?
– phpMyAdmin SQL Dump
– version 2.6.1
– http://www.phpmyadmin.net
– Host: localhost
– Generation Time: Oct 19, 2006 at 01:13 PM
– Server version: 4.1.9
– PHP Version: 4.3.10
– Database: aston
–
– Table structure for table color
CREATE TABLE color
(
colorid
int(11) NOT NULL auto_increment,
color
varchar(30) NOT NULL default ‘’,
PRIMARY KEY (colorid
)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;
–
– Dumping data for table color
INSERT INTO color
VALUES (1, ‘red’);
INSERT INTO color
VALUES (2, ‘green’);
INSERT INTO color
VALUES (3, ‘blue’);
INSERT INTO color
VALUES (4, ‘yellow’);
–
– Table structure for table material
CREATE TABLE material
(
materialid
int(11) NOT NULL auto_increment,
material
varchar(30) NOT NULL default ‘’,
PRIMARY KEY (materialid
)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
–
– Dumping data for table material
INSERT INTO material
VALUES (1, ‘wood’);
INSERT INTO material
VALUES (2, ‘iron’);
–
– Table structure for table perscolor
CREATE TABLE perscolor
(
perscolorid
int(11) NOT NULL auto_increment,
persid
int(11) NOT NULL default ‘0’,
colorid
int(11) NOT NULL default ‘0’,
PRIMARY KEY (perscolorid
)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ;
–
– Dumping data for table perscolor
INSERT INTO perscolor
VALUES (1, 1, 1);
INSERT INTO perscolor
VALUES (2, 1, 2);
INSERT INTO perscolor
VALUES (3, 2, 1);
INSERT INTO perscolor
VALUES (5, 3, 3);
INSERT INTO perscolor
VALUES (6, 3, 2);
–
– Table structure for table persmaterial
CREATE TABLE persmaterial
(
persmatid
int(11) NOT NULL auto_increment,
persid
int(11) NOT NULL default ‘0’,
materialid
int(11) NOT NULL default ‘0’,
PRIMARY KEY (persmatid
)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
–
– Dumping data for table persmaterial
INSERT INTO persmaterial
VALUES (1, 1, 1);
INSERT INTO persmaterial
VALUES (2, 1, 2);
INSERT INTO persmaterial
VALUES (3, 2, 1);
INSERT INTO persmaterial
VALUES (5, 3, 2);
–
– Table structure for table person
CREATE TABLE person
(
persid
int(11) NOT NULL auto_increment,
name
varchar(30) NOT NULL default ‘’,
PRIMARY KEY (persid
)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
–
– Dumping data for table person
INSERT INTO person
VALUES (1, ‘john’);
INSERT INTO person
VALUES (2, ‘emily’);
INSERT INTO person
VALUES (3, ‘liz’);