Slow queries on many tables with large no of rows.

I had this table which contained an nXm relationship between two big tables.

PK_TABLE1 varchar(100)
PK_TABLE2 int
INFO1 varchar(30)
INFO2 date
INFO3 int
Unique key unq_t1_pk2(PK_TABLE1, PK_TABLE2)

Since it was growing very large as a single table, i divided it into multiple tables (100 tables) by breaking it horizontally based on PK_TABLE2.

Queries were running fine until a few days ago. Now the machine remains heavily loaded and the queries take too much time.

Each table has around 8,000,000 rows, 220 MB of .MYD file and around 200 MB of .MYI file.

My my.cnf looks something like this

–snip–
skip-locking
key_buffer = 1500M
back_log = 150
table_cache = 512
sort_buffer_size = 256K
read_buffer_size = 1M
max_connections = 1000
long_query_time = 5
thread_cache = 200
query_cache_size= 512M
query_cache_limit= 2M
–snip–

and queries are of the following form

mysql> explain SELECT * FROM BIG_TBL_39 WHERE PK_TABLE2=‘171839’ and PK_TABLE1 in (binary ‘variable1’,binary ‘variable2’,binary ‘var3’,binary ‘var4’,binary ‘var5’,binary ‘var6’,binary ‘var7’,binary ‘var8’,binary ‘var9’,binary ‘var10’);
±—±------------±-----------±------±--------------±-- ---------±--------±-----±-----±------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±-----------±------±--------------±-- ---------±--------±-----±-----±------------+
| 1 | SIMPLE | BIG_TBL_39 | range | unq_t1_pk2 | unq_t1_pk2 | 46 | NULL | 20 | Using where |
±—±------------±-----------±------±--------------±-- ---------±--------±-----±-----±------------+
1 row in set (0.04 sec)

Any suggestions ??

I am using mysql5.1.22 on a dedicated 6 GB machine with 2 Hyperthreaded Intel Xeon CPUs

Maybe you have to to an “analyze table BIG_TBL_39”?

What is the output of “show indexes from BIG_TBL_39” - especially the cardinality (in contrast to the number of rows)

mysql> show indexes from BIG_TBL_39;
±-----------±-----------±-----------±-------------±---- --------±----------±------------±---------±-------±---- -±-----------±--------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
±-----------±-----------±-----------±-------------±---- --------±----------±------------±---------±-------±---- -±-----------±--------+
| BIG_TBL_39 | 0 | unq_t1_pk2 | 1 | PK_TABLE1 | A | NULL | NULL | NULL | | BTREE | |
| BIG_TBL_39 | 0 | unq_t1_pk2 | 2 | PK_TABLE2 | A | 9556205 | NULL | NULL | | BTREE | |
±-----------±-----------±-----------±-------------±---- --------±----------±------------±---------±-------±---- -±-----------±--------+
2 rows in set (0.07 sec)

This key is a unique key which cannot be removed to maintain consistency in data.

Odd thing:
primary key for pk_table1 is a VARCHAR.
It should ideally be a INT that is used for this.
But I guess it can be hard to change that now.

The query references the exact value of PK_TABLE2 while it is using the IN(…) on PK_TABLE1.
This leads me to believe that you could probably be better off with a combined index in the (PK_TABLE2,PK_TABLE1) order instead of your current (PK_TABLE1,PK_TABLE2).

Because the IN() on the first column in the index will probably give a range scan of the index and you haven’t really eliminated any records.
While if it can use the absolute value of the PK_TABLE2 to reduce the records to range scan it should be quicker.

I would test to switch the order of the columns in the unique index to try to speed things up with this query.