Millions of rows... optimization help needed

The DB structure is very flexible, allowing for various types of widgets and each type having various fields/values associated.

For the sake of this question, assume that there are 3 types of widgets:

mysql> SELECT * FROM widgets_types;±-------±--------------------±--------------------+| typeid | typename | modified |±-------±--------------------±--------------------+| 1 | X Widgets | 2008-02-05 09:57:37 | | 2 | Y Widgets | 2008-02-05 09:58:39 | | 3 | Z Widgets | 2008-02-05 09:59:29 | ±-------±--------------------±--------------------+3 rows in set (0.02 sec)

There are various fields that can be associated with each widget type:

mysql> SELECT * FROM widgets_fields;±—±----------------------±------------+| id | fieldname | fieldtypeid |±—±----------------------±------------+| 1 | First Name | 001 | | 2 | Last Name | 001 | | 3 | Primary Phone | 001 | | 4 | E-mail | 001 | | 5 | Address | 001 | | 6 | City | 001 | | 7 | Zip/Postal Code | 001 | | 8 | State | 001 | ±—±----------------------±------------+8 rows in set (0.02 sec)

Each widget type has a different set of fields associated with it.

mysql> SELECT * FROM widgets_types_fields;±—±-------±--------±-------+| id | typeid | fieldid | active |±—±-------±--------±-------+| 1 | 1 | 1 | 1 || 2 | 1 | 2 | 1 || 3 | 1 | 3 | 1 || 12 | 2 | 1 | 1 || 13 | 2 | 2 | 1 || 15 | 2 | 5 | 1 || 16 | 2 | 8 | 1 || 23 | 3 | 1 | 1 || 24 | 3 | 2 | 1 || 25 | 3 | 3 | 1 || 26 | 3 | 6 | 1 || 27 | 3 | 8 | 1 |±—±-------±--------±-------+12 rows in set (0.21 sec)

The dynamic capability of this DB structure needs to be retained as new widget types will get added and different fields will be associated with different widget types.

** due to the required flexibility, it won’t be possible to have a single table widget type with fixed columns for each field for that type **

mysql> describe widgets;±-----------------±--------------------------±-----±----±------------------±---------------+| Field | Type | Null | Key | Default | Extra |±-----------------±--------------------------±-----±----±------------------±---------------+| id | int(8) unsigned | NO | PRI | NULL | auto_increment | | typeid | int(8) unsigned | NO | MUL | 0 | | | start_price | decimal(5,2) | NO | | NULL | | | current_price | decimal(5,2) | NO | | NULL | | | sale_price | decimal(5,2) | NO | | NULL | | | ts_sold | int(10) unsigned zerofill | NO | | 0000000000 | | | modified | timestamp | NO | | CURRENT_TIMESTAMP | | ±-----------------±--------------------------±-----±----±------------------±---------------+7 rows in set (0.00 sec)mysql> show index from widgets;±--------±-----------±-----------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |±--------±-----------±-----------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+| widgets | 0 | PRIMARY | 1 | id | A | 2316349 | NULL | NULL | | BTREE | | | widgets | 1 | idx_typeid | 1 | typeid | A | 18 | NULL | NULL | | BTREE | | ±--------±-----------±-----------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+2 rows in set (2.19 sec)mysql> describe widget_values;±---------±----------------±-----±----±------------------±---------------+| Field | Type | Null | Key | Default | Extra |±---------±----------------±-----±----±------------------±---------------+| id | int(8) unsigned | NO | PRI | NULL | auto_increment | | widgetid | int(8) unsigned | NO | MUL | NULL | | | fieldid | int(8) unsigned | NO | MUL | NULL | | | valchar | varchar(255) | NO | | NULL | | | modified | timestamp | NO | | CURRENT_TIMESTAMP | | ±---------±----------------±-----±----±------------------±---------------+5 rows in set (0.00 sec)mysql> show index from widget_values;±--------------±-----------±-----------------------------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |±--------------±-----------±-----------------------------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+| widget_values | 0 | PRIMARY | 1 | id | A | 17488933 | NULL | NULL | | BTREE | | | widget_values | 1 | idx_widgetid_fieldid | 1 | widgetid | A | 5829644 | NULL | NULL | | BTREE | | | widget_values | 1 | idx_widgetid_fieldid | 2 | fieldid | A | 17488933 | NULL | NULL | | BTREE | | | widget_values | 1 | idx_fieldid_valchar | 1 | fieldid | A | 17 | NULL | NULL | | BTREE | | | widget_values | 1 | idx_fieldid_valchar | 2 | valchar | A | 5829644 | NULL | NULL | | BTREE | | | widget_values | 1 | idx_widgetid_fieldid_valchar | 1 | widgetid | A | 5829644 | NULL | NULL | | BTREE | | | widget_values | 1 | idx_widgetid_fieldid_valchar | 2 | fieldid | A | 17488933 | NULL | NULL | | BTREE | | | widget_values | 1 | idx_widgetid_fieldid_valchar | 3 | valchar | A | 17488933 | NULL | NULL | | BTREE | | ±--------------±-----------±-----------------------------±-------------±------------±----------±------------±---------±-------±-----±-----------±--------+8 rows in set (17.89 sec)

The problem:
There are millions of rows in the widgets table, and even more in widget_values.

I need to be able to select matching widgets, such as:
Scenario 1:

  • typeid = 3
  • fieldid 8 has value ‘CA’
  • fieldid 1 has value LIKE ‘%john%’

mysql> EXPLAIN SELECT widgetid FROM widget_values WHERE widgetid IN (SELECT widgetid FROM widget_values WHERE fieldid = ‘8’ AND valchar = ‘CA’) AND widgetid IN (SELECT widgetid FROM widgets WHERE typeid = ‘3’) AND fieldid = ‘1’ AND valchar LIKE ‘%john%’;±—±-------------------±--------------±---------------±----------------------------------------------------------------------±----------------------±--------±-----------±--------±-------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±-------------------±--------------±---------------±----------------------------------------------------------------------±----------------------±--------±-----------±--------±-------------------------+| 1 | PRIMARY | widget_values | ref | idx_fieldid_valchar | idx_fieldid_valchar | 4 | const | 5051360 | Using where | | 3 | DEPENDENT SUBQUERY | widgets | ref | idx_typeid | idx_typeid | 4 | const | 917832 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | widget_values | index_subquery | idx_widgetid_fieldid,idx_fieldid_valchar,idx_widgetid_fieldid_valchar | idx_widgetid_fieldid | 8 | func,const | 1 | Using where | ±—±-------------------±--------------±---------------±----------------------------------------------------------------------±----------------------±--------±-----------±--------±-------------------------+3 rows in set (0.32 sec)

and running the actual query:

1887 rows in set (3 min 6.83 sec)

SHOW INNODB STATUS while running the query showed this: view

Scenario 2:

  • typeid = 3
  • fieldid 7 = 90210
  • fieldid 8 = ‘CA’

mysql> EXPLAIN SELECT widgetid FROM widget_values WHERE widgetid IN (SELECT widgetid FROM widget_values WHERE fieldid = ‘8’ AND valchar = ‘CA’) AND widgetid IN (SELECT widgetid FROM widgets WHERE typeid = ‘3’) AND fieldid = ‘7’ AND valchar = ‘90210’;±—±-------------------±--------------±---------------±----------------------------------------------------------------------±----------------------±--------±------------±-------±-------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±-------------------±--------------±---------------±----------------------------------------------------------------------±----------------------±--------±------------±-------±-------------------------+| 1 | PRIMARY | widget_values | ref | idx_fieldid_valchar | idx_fieldid_valchar | 261 | const,const | 464 | Using where | | 3 | DEPENDENT SUBQUERY | widgets | ref | idx_typeid | idx_typeid | 4 | const | 917832 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | widget_values | index_subquery | idx_widgetid_fieldid,idx_fieldid_valchar,idx_widgetid_fieldid_valchar | idx_widgetid_fieldid | 8 | func,const | 1 | Using where | ±—±-------------------±--------------±---------------±----------------------------------------------------------------------±----------------------±--------±------------±-------±-------------------------+3 rows in set (0.60 sec)

and running the actual query:

181 rows in set (0.10 sec)

Scenario 3:

  • typeid = 3
  • fieldid 1 LIKE ‘%angel%’
  • fieldid 2 LIKE ‘%goff%’

mysql> EXPLAIN SELECT lv1.widgetid FROM widget_values AS lv1, widget_values AS lv2, widgets WHERE widgets.typeid = 3 AND lv1.fieldid = 1 AND lv1.valchar LIKE ‘%angel%’ AND lv1.widgetid = widgets.id AND lv2.fieldid = 2 AND lv2.valchar LIKE ‘%goff%’ AND lv2.widgetid = widgets.id; ±—±------------±--------±-----±----------------------------------------------------------------------±---------------------±--------±-------------------------±-------±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±--------±-----±----------------------------------------------------------------------±---------------------±--------±-------------------------±-------±------------+| 1 | SIMPLE | widgets | ref | PRIMARY,idx_typeid | idx_typeid | 4 | const | 917832 | Using index | | 1 | SIMPLE | lv1 | ref | idx_widgetid_fieldid,idx_fieldid_valchar,idx_widgetid_fieldid_valchar | idx_widgetid_fieldid | 8 | widgetr.widgets.id,const | 1 | Using where | | 1 | SIMPLE | lv2 | ref | idx_widgetid_fieldid,idx_fieldid_valchar,idx_widgetid_fieldid_valchar | idx_widgetid_fieldid | 8 | widgetr.widgets.id,const | 1 | Using where | ±—±------------±--------±-----±----------------------------------------------------------------------±---------------------±--------±-------------------------±-------±------------+3 rows in set (0.00 sec)

and running the actual query:

47 rows in set (5 min 17.17 sec)

The ultimate goal is to get the matching widgetid’s as fast as possible. Eagerly awaiting feedback from you all…

Hardware: Q6600 2.4Ghz & 4GB memory
show variables output: view

Thanks in advance.

Making some progress:
EXPLAIN SELECT widgets.id FROM widgets INNER JOIN widget_values lv1 ON lv1.widgetid = widgets.id INNER JOIN widget_values lv2 ON lv2.widgetid = widgets.id WHERE widgets.typeid = 3 AND lv1.fieldid = 1 AND lv1.valchar LIKE ‘%b%’ AND lv2.fieldid = 2 AND lv2.valchar LIKE ‘%jo%’;

±—±------------±--------±-----±----------------------------------------------------------------------±---------------------±--------±---------------------------±-------±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±--------±-----±----------------------------------------------------------------------±---------------------±--------±---------------------------±-------±------------+| 1 | SIMPLE | widgets | ref | PRIMARY,idx_typeid | idx_typeid | 4 | const | 917832 | Using index | | 1 | SIMPLE | lv1 | ref | idx_widgetid_fieldid,idx_fieldid_valchar,idx_widgetid_fieldid_valchar | idx_widgetid_fieldid | 8 | widgetr.widgets.id,const | 1 | Using where | | 1 | SIMPLE | lv2 | ref | idx_widgetid_fieldid,idx_fieldid_valchar,idx_widgetid_fieldid_valchar | idx_widgetid_fieldid | 8 | widgetr.lv1.widgetid,const | 1 | Using where | ±—±------------±--------±-----±----------------------------------------------------------------------±---------------------±--------±---------------------------±-------±------------+3 rows in set (0.01 sec)

produces results in:

9584 rows in set (53.49 sec)

5min down to less than a min… not bad… but it’s possible to get better, isn’t it? Help! (please…) =)

You may want to try to avoid using the ‘%term%’ syntax in favor of

  1. Using ‘term%’ - it is very difficult to quickly do a search using a wild-card at the start of a search string. I’ve read that it is like looking for a string in the phone book. If you do not have a wild-card at the front, you can go straight go straight to the right place in the phone book. If you do have a wild-card, you have to scan every single entry - there is not a really efficient way to do it. The results will be different, but that might be acceptable to you.

  2. Building a fulltext index and using the matching() using … syntax for fulltext searches. This will only search on the whole word and vastly speed up your queries.

Hope that helps -

Jon