Not the answer you need?
Register and ask your own question!

performance of query for normalized DB

phlypephlype EntrantCurrent User Role Beginner
I'm trying to build a webapplication where users can search for a person having a particular preference for color and material. To store this information I use the following structure (a MySQL dump can be found at the end of this post):
*table person with fields:
-persid: autoincrement id
-name: name of the person
*table material with fields:
-materialid: autoincrement id
-material: name of the material eg "wood"
*table color with fields:
-colorid: autoincrement id
-color: name of the color eg "green"
*table persmaterial with fields:
-persmatid: autoincrement id
-persid: link to table person
-materialid: link to table material
*table perscolor with fields:
-perscolorid: autoincrement id
-persid: link to table person
-colorid: link to table color

In the webapplication the search can be entered by the users as a kind of pseudo query:
(color=red OR color=blue) AND color=green AND material=iron

My question is: how can I automatically transform this pseudo query into an efficient MySQL query?
I have tried out some different options:


Option 1:
(SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=1 OR pc.colorid=2) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1)) UNION
(SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=2 OR pc.colorid=3) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1))
Remarks:
*I do not see how to turn a general pseudo query into a query like the one in option 1, except for turning the pseudo query into a sum of products form where the sulms would correspond to the UNIONs. IS there a clever way to obtain such a sum of products form from an arbitrary pseudo query?


Option 2:
SELECT persid FROM person p WHERE
(EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=1 AND p.persid=pc.persid)
OR
EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=3 AND p.persid=pc.persid))
AND
EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=2 AND p.persid=pc.persid)
AND
EXISTS(SELECT * FROM persmaterial pm WHERE pm.materialid=2 AND p.persid=pm.persid)
Remarks:
*very easy to get from pseudo query to MySQL query but what about performance?

Option 3:
SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE
p.persid=pc.persid
AND
(pc.colorid=1 OR pc.colorid=2 OR pc.colorid=3)
AND p.persid=pm.persid
AND pm.materialid=2
GROUP BY p.persid HAVING
sum(case when pc.colorid in ('1','3') then 1 else 0 end) >= 1
AND
sum(case when pc.colorid='2' then 1 else 0 end)>=1
AND
sum(case when pm.materialid='2' then 1 else 0 end)>=1
Remarks:
*this option requires the pseudo query to be turned into a product of sums form; again is their a clever way to obtain such a form;

Option 4
SELECT DISTINCT pc1.persid FROM perscolor pc1
INNER JOIN perscolor pc2
ON pc1.persid=pc2.persid AND pc2.colorid=2
INNER JOIN persmaterial pm1
ON pc1.persid=pm1.persid AND pm1.materialid=2
LEFT OUTER JOIN perscolor pc3
ON pc1.persid=pc3.persid AND pc3.colorid=1
LEFT OUTER JOIN perscolor pc4
ON pc1.persid=pc4.persid AND pc4.colorid=3
WHERE COALESCE(pc3.persid,pc4.persid) IS NOT NULL
Remarks:
*this option requires the pseudo query to be turned into a product of sums form

Option 5:
SELECT p.persid FROM person p, persmaterial pm,perscolor pc1,perscolor pc2,perscolor pc3 WHERE p.persid=pm.persid AND p.persid=pc1.persid AND p.persid=pc2.persid AND p.persid=pc3.persid AND (pc1.colorid=1 OR pc2.colorid=3) AND pc3.colorid=2 AND pm.materialid=2 GROUP BY p.persid
Remarks:
*very easy to get from pseudo query to MySQL query but what about performance?



-- phpMyAdmin SQL Dump
-- version 2.6.1
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Oct 19, 2006 at 01:13 PM
-- Server version: 4.1.9
-- PHP Version: 4.3.10
--
-- Database: `aston`
--

--

--
-- Table structure for table `color`
--

CREATE TABLE `color` (
`colorid` int(11) NOT NULL auto_increment,
`color` varchar(30) NOT NULL default '',
PRIMARY KEY (`colorid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;

--
-- Dumping data for table `color`
--

INSERT INTO `color` VALUES (1, 'red');
INSERT INTO `color` VALUES (2, 'green');
INSERT INTO `color` VALUES (3, 'blue');
INSERT INTO `color` VALUES (4, 'yellow');

--

--
-- Table structure for table `material`
--

CREATE TABLE `material` (
`materialid` int(11) NOT NULL auto_increment,
`material` varchar(30) NOT NULL default '',
PRIMARY KEY (`materialid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;

--
-- Dumping data for table `material`
--

INSERT INTO `material` VALUES (1, 'wood');
INSERT INTO `material` VALUES (2, 'iron');

--

--
-- Table structure for table `perscolor`
--

CREATE TABLE `perscolor` (
`perscolorid` int(11) NOT NULL auto_increment,
`persid` int(11) NOT NULL default '0',
`colorid` int(11) NOT NULL default '0',
PRIMARY KEY (`perscolorid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ;

--
-- Dumping data for table `perscolor`
--

INSERT INTO `perscolor` VALUES (1, 1, 1);
INSERT INTO `perscolor` VALUES (2, 1, 2);
INSERT INTO `perscolor` VALUES (3, 2, 1);
INSERT INTO `perscolor` VALUES (5, 3, 3);
INSERT INTO `perscolor` VALUES (6, 3, 2);

--

--
-- Table structure for table `persmaterial`
--

CREATE TABLE `persmaterial` (
`persmatid` int(11) NOT NULL auto_increment,
`persid` int(11) NOT NULL default '0',
`materialid` int(11) NOT NULL default '0',
PRIMARY KEY (`persmatid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;

--
-- Dumping data for table `persmaterial`
--

INSERT INTO `persmaterial` VALUES (1, 1, 1);
INSERT INTO `persmaterial` VALUES (2, 1, 2);
INSERT INTO `persmaterial` VALUES (3, 2, 1);
INSERT INTO `persmaterial` VALUES (5, 3, 2);

--

--
-- Table structure for table `person`
--

CREATE TABLE `person` (
`persid` int(11) NOT NULL auto_increment,
`name` varchar(30) NOT NULL default '',
PRIMARY KEY (`persid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;

--
-- Dumping data for table `person`
--

INSERT INTO `person` VALUES (1, 'john');
INSERT INTO `person` VALUES (2, 'emily');
INSERT INTO `person` VALUES (3, 'liz');

Comments

  • phlypephlype Entrant Current User Role Beginner
    I added some more persons and preferences to the different tables so that there where about 10000 persons with random color and material preferences in the database; the example query resulted in 1900 persons; here are the timing results in seconds:
    option 1: 0.2
    option 2: 0.005
    option 3: 0.003
    option 4: 0.007
    option 5: 0.006
    option 6: 0.002

    where option 6 is a newly added option:
    SELECT pm.persid FROM persmaterial pm WHERE pm.materialid=2 AND pm.persid IN (SELECT pc.persid FROM perscolor pc WHERE pc.colorid=2 AND pc.persid IN (SELECT pc.persid FROM perscolor pc WHERE pc.colorid=1 OR pc.colorid=3))

    I was a bit surprised by the good timing results for option 2; is there any one who can explain the way MySQL tackles the query in option 2 (I suppose MySQL is not going over all records of table "person" launching a subquery for each record of table "person"?);

    I tend to conclude that option 2 is the way to go since it is so easy to translate the pseudo query into the MySQL query and on top of that the resulting MySQL query is rather fast!
  • phlypephlype Entrant Current User Role Beginner
    I added some more persons and preferences to the different tables so that there where about 10000 persons with random color and material preferences in the database; the example query resulted in 1900 persons; here are the timing results in seconds:
    option 1: 0.2
    option 2: 0.005
    option 3: 0.003
    option 4: 0.007
    option 5: 0.006
    option 6: 0.002

    where option 6 is a newly added option:
    SELECT pm.persid FROM persmaterial pm WHERE pm.materialid=2 AND pm.persid IN (SELECT pc.persid FROM perscolor pc WHERE pc.colorid=2 AND pc.persid IN (SELECT pc.persid FROM perscolor pc WHERE pc.colorid=1 OR pc.colorid=3))

    I was a bit surprised by the good timing results for option 2; is there any one who can explain the way MySQL tackles the query in option 2 (I suppose MySQL is not going over all records of table "person" launching a subquery for each record of table "person"?);

    I tend to conclude that option 2 is the way to go since it is so easy to translate the pseudo query into the MySQL query and on top of that the resulting MySQL query is rather fast!
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.