Key Cache Behaviour

Hi,

I’ve set up a new website and actually I’m preparing some “load” testing regarding a couple of search queires.

I’ve seen a behaviour that I cannot explain, so maybe you guys have some answers?

There is Mysql 5.0.30 running on a 64bit Debian Etch distribution. Server has two XEON CPUs and 8 GB of Ram. We mostly use MyIsam tables so I’ve set the key_buffer to the maximum of 4 GB.

Next there are 31 tables in a seperate database with overall 111 million rows. Each table has an average index size of 100 MB (datasize 170 MB). The “data” is split among 31 tables due to partioning ( so we don’t have to deal with MyIsam locking issues too much).

The table structures are identical:

userID1 (int)
userID2 (int)
userVote1 (tinyint)
userVote2 (tinyint)
voteDate1 (datetime)
voteDate2 (datetime)

and only one primary key (index) on the fields (userID1, userID2).

Now our application typically tries to find records where userID1 is given and sometimes userVote1 or userVote2 is queried, e.g.

SELECT userID2 FROM table_1 WHERE userID1 = 100 AND userVote1=2

So far so good. The first time a query is started it typically lasts about 5-10 seconds (sometimes 10-20 seconds) for the query above. Each subsequent query for the specified userID (in the example ‘100’) then only lasts about 0.2 - 0.4 seconds.

This is not due to query cache( because it’s still the same when we use SQL_NO_CACHE). I suggest the performance increase is a result of loading the key cache for that table.

When I send random queries for userIDs between 1 and 10000 (which are stored in table table_1) after a couple of queries every following query only needs 2 - 5 seconds. So thats fine (and again I guess a result of the Key Cache).

But: When I start to spread queries to all of the tables (actually 31, userID1 between 1 and 300000) , there’s no such a performance increase, altough there should be enough memory for mysql to hold all keys (overall size about 3.1 GB) in RAM.

Do you have any idea, why the “key cache” thing only works, when I spread the queries to about 5 tables. Also using a seperate key cache (SET GLOBAL keycache bla bla) doesn’t help.

I’ve also tried to use LOAD INDEX INTO CACHE table1, table2… but that also doesn’t help.

Does anyone have an idea? Our variables regarding key_cache are:

Key_blocks_not_flushed 0
Key_blocks_unused 1.646 k
Key_blocks_used 418 k
Key_read_requests 2.429 M
Key_reads 88 M
Key_write_requests 56 M
Key_writes 36 M
Key_buffer_fraction_% 19,64 %

Would the “Midpoint Insertion Strategy” help?
http://dev.mysql.com/doc/refman/5.1/en/midpoint-insertion.ht ml

If so, what do you guess to set the key_cache_division_limit value?

I appreciate any help.
Greetings
Christian

Your setting with 4GB of RAM for the key_buffer_size sounds very high.
Unless you know that most of your queries only use indexes (both for finding and retrieving the data) you should lower this setting to about 25% of the available RAM to allow that more RAM is used for the OS file cache which MyISAM relies on.
The OS file cache is probably the reason why you are getting fast reads when reading the same value repetitively and slow reads when you are scattering the reading all over the table.

For the tuning part:
If you only have one index (the primary key) on columns (userID1, userID2).

And you issue queries like:

SELECT userID2 FROM table_1 WHERE userID1 = 100 AND userVote1=2

It will use the index to find which rows that match the userID1 criteria.
BUT it will have to jump to the table for each of these records to check the userVote1=2 criteria and to get the userID2 data.
And since only the index part is stored in the key_buffer the table access will be slower if you set it to high.

What you can do to speed up a query like this is to add a combined index on both column conditions that are AND’ed together as part of the WHERE clause:

(userID1, userVote1)

That way this index can be used to find the exact record that matches this condition without having to scan rows in the table.

And there is even one more optimization that can be used.
If you add the column that you are selecting last in the index.

(userID1, userVote1, userID2)

The optimizer figures out that it doesn’t even have to access the table to return the result.

Meaning that it only reads the index and if most of your queries fit into this category then you possibly can increase the key_buffer_size to 50%.

Dear Sterin,

thanks for your fast response. Your explaination regarding the OS file cache sounded very reasonable, so I tested my “benchmark” script with a key_buffer size of 2GB. Unfortunately that didn’t helped a lot. ) As I told you its a 8 GB machine, so I’ve set the following conf-vars in my.cnf:

inno_db_buffer_pool = 1024M (for a couple of innodb tables used)
key_buffer = 2000M
myisam_sort_buffer_size = 1024M

So the OS has around 4GB for file caching. Do you think decreasing one of the mysql-vars would still result in performance increase? My feeling is that 4GB should be really enough )

For the tuning part: In fact, it’s a little more complicated compared to what I’ve posted. ) In the production system where will be a inner join on a users table ( filled with about 300.000 records), but I’d still concentrate to the tables definition I’ve posted, because each of that tables has around 3 - 7 million records. So thats the “bottleneck” in fact. )

Unfortunately in the production system there will be more than the query I’ve posted, in fact:

Main queries on that large tables (about 80%) will be

SELECT userVote1, userVote2 FROM table_x WHERE userID1=CONST AND userID2=CONST

(returning maximum one record)

That queries only last a couple of ms, so I don’t bother if it will read from key cache or additionally from the data file.

But there are a couple of often (20%) used search queries ( which often will result in a large result between 200-4000 records, and usually last about 2-10 seconds). BTW: When benchmarking those queries I used select count(*) instead of fetching each records, so theres no overhead for the transfer to the client.

SELECT userID2 FROM table_x WHERE userID1 = CONST AND userVote1 = 2 AND userVote2 = 2

SELECT userID2 FROM table_x WHERE userID1 = CONST AND userVote2 IN (1,2)

SELECT userID2 FROM table_x WHERE userID1 = CONST AND userVote1 = 2

SELECT userID2 FROM table_x WHERE userID1 = CONST AND userVote1 = 0 AND userVote2 = 2

So. For “optimal” performance ( index based query) I would have the following indexes, right?

(userID1, userID2, userVote1, userVote2)

(userID1, userVote2, userID2)

(userID1, userVote1, userID2)

(userID1, userVote1, userVote2, userID2)

The only downside is, that each table has about 2 - 7 million records. Creating such additional index would increase the index-file size for the largest table from actually 112MB to around, I guess, 500M at least.

With 31 tables it would increase the overall key size significally so that the index definitly would not fit into the key_buffer

Additionally in the case that MySQL would crash it would took 5 times longer to restore the index files (which actually needs about 2 hours for the whole database).

Do you still think I would benefit from setting that indexes? Is the performance increase for index-only reading large enough compared to have a small index that fits into RAM (and which is used for all search-queries but still needs a datafile lookup)?

Are there any numbers out there about how long does it take for mysql to read non-indexed information from the datafile? I thought when mysql used the index file the lookup wouldn’t take so much time?

Are there tools out there that benchmark how long mysql needs for reading from datafile? I think there should be some kind of “BENCHMARCK SELECT …” where mysql explains how much time each step lasted. ) Or is something like that out there and I simply don’t know about it? )

Damn. Sorry for the long post. I’ve really overseen that I first should set the indexes properly for the queries I use often. Of course I should add the (userID1, voteDate1, …) and so on stuff. )

But one question remains: Is it better to leave the indexes short (e.g. only set indexes for the where-part), so that possibly more of the index can be held im memory.

Or should I add every index combination, so that the whole query would could be resolved by just looking an the indexes, with the downside that the index-size would be significally higher?

e.g. For the queries above I would set the following index (so that the “where”-lookup can be solved by just looking to the indexes, with additional need for a lookup in the datafile):

1: (userID1, userVote1, userVote2)
2: (userID1, userVote2)

Or should I add for each of the 5 possible queries a seperate index, e.g.

1: (userID1, userVote1, userVote2)
2: (userID1, userVote2, userID2, voteDate2)
3: (userID1, userVote1, userID2, voteDate1)

?

Last but not least another question ) The second fields in the indexes (userVote1 and userVote2) will only have three possible values: 0, 1, or 2. With a distribution of about

50%: 0
10%: 2
40%: 1

Does it make sense to add the index on that field regarding the distribution of the possible values? (That’s why I “forgot” to set the indexes for that column).

It looks like you also like to write a lot :wink:

Of course, It usually is more complicated than it seems :wink:

But tuning is a matter of balance.
Adding good indexes for the queries that are executed often will give you more benefit than for queries that aren’t executed often.

At the same time some queries might be unbearably slow without the proper indexes.

I think that I would choose to have the userVote column last in the index because worse case it will only reduce scans by 50% but that is still 4000 rows less according to your information.
And the bigger the tables are the more it comes down to minimize disk seek times.

But it is right that if you want to keep down the index size you can skip the columns from the SELECT part out of the index.
You will gain more by having all columns part of the WHERE in the index than you will by having column from the SELECT as part of the index.

I thought about another thing.
What is your open tables cache set to and does your status variable Opened_tables increase?

But as usual you should always test every scenario because in database optimization there is no absolut right or wrong.
Because performance is so tightly coupled to the applications query pattern and the data stored in the database you can only give general advice about what to tweak.

Dear Sterin,

ok, I’m trying to keep this post short. ) Regarding your question to the open_tables stuff: Table_Cache is set to 64.

show status like ‘%_tables%’;

shows

Open_tables | 64 Opened_tables | 0

(altough phpmyadmin shows 8.762 k for opened tables !?)

Regarding index tuning: yes, I know that I should try with different indexes. Unfortunately the system is already in production (i’m planning a relaunch in the next 10 days). The tables are not yet used for the search-queries, but the existing system already inserts data into the tables.

Adding/Removing indexes from all 31 tables (with 111 mio rows) is really a big task for mysql, so I guess it would take a whole day to add just the indexes suggested by you.

“Playing” with adding other indexes e.g. seems to be not very easy. Altough I could use a replicated slave for that, it would not give real-life hints (because I yet don’t have a spare server with similar hardware). )

Blocking of about 8 hours of the production system is no good idea. )

BTW: Do you think I would benefit from changing tables to InnoDB?

It was a bit odd that you had 0 opened tables.
Because that is basically not possible if you have selected something at least once from a table it should have incremented it one step.

What I thought was that since you have at least 31 tables and if you have more than one client connected each client connection takes an additional 31 file descriptors, so the amount of file descriptors tend to run out pretty fast.

So check again if the opened tables status is increasing and if so increase the table_cache variable.

Dear Sterin,

hmm, I didn’t know that I have to “use database” before the " show status like ‘%opened%’" query.

However, you are right. There are actually about 120 tables in the database, about 100 will be accessed on a regular basis. So I decided to increase table_cache to 256 (just for testing purpose). And everything works a little faster now. I didn’t know that mysql kills the key-cache for a table which is not acutally “opened” (I’ve read this, is that right?)

Anyway, I started to add indexes yesterday and now the queries only last about 0.1-1.0 seconds, which is really better than before. So Sterin thanks about that hint. Too bad that I personally thought adding an index on a 50/30/20-distributed column would not add much performance. )

[B]Quote:[/B]

hmm, I didn’t know that I have to “use database” before the " show status like ‘%opened%’" query.

OK, so that was the reason.
[B]Quote:[/B]

However, you are right. There are actually about 120 tables in the database, about 100 will be accessed on a regular basis. So I decided to increase table_cache to 256 (just for testing purpose). And everything works a little faster now. I didn’t know that mysql kills the key-cache for a table which is not acutally “opened” (I’ve read this, is that right?)

I've never heard of it. The MyISAM key cache is a Least Recently Used strategy and that means that indexes not used automatically will get evicted from the key cache.

But the table_open cache is closing unused file handles if the cache is full instead of reusing one previously opened.
So to avoid overhead you can have a higher table cache setting.

[B]Quote:[/B]

Anyway, I started to add indexes yesterday and now the queries only last about 0.1-1.0 seconds, which is really better than before. So Sterin thanks about that hint. Too bad that I personally thought adding an index on a 50/30/20-distributed column would not add much performance.

Well if you have small tables that can basically be cached in RAM then it wouldn't do so much difference. But the more the DB size/RAM size ratio is increasing it gets more and more important since each disk seek in the world of databases takes a _very_ long time. ;)