Tuning table indexes with large datasets and frequent index updates

Hi,

I’m doing some statistics tracking using a myisam table and the table itself currently has 8 million rows with a 1.1GB MYD and a 700MB MYI. The server is a dual quadcore with 4GB ram. I have a key buffer size of 1GB. The table experiences frequent updates, writes and selects (an even distribution).

The table itself has 8 rows of mostly integer type except for a couple of char fields. Although I used MyISAM, the combination of select/update/inserts were faster than innodb.

Since the rows are set length, inserts are quite fast. At the same time, I used the delay_key_write option to speed up updates.

However, the index does update often as some of the columns that I sort by update (or get added to) frequently. I assume this invalidates the key buffer. For example, If i do a select count() where name like ‘te%’, the first time is slow, but on following querries its quite fast, even if i use a subpart of the key such as like ‘tes%’. However, 30 minutes from doing this, the initial select is slow again. Also, whenever the select count() query is slow, I see a tremendous amount drive read activity, which I assume reflects the index being loaded into memory.

What’s the best solution here? It seems I ideally should bump up the db server’s RAM and have the entire index file in memory. Should I place the index file on its own ram drive? Ram drives do have an additional file system overhead - I wish a hybrid Heap/MyISAM structure was possible where the entire index was manipulated in ram and the data was on file, requiring the index to be rebuilt on a database startup.

Is MySQL Cluster the best alternative here?

Following up on this scenario - i split the char fields into its own test table. This table doesn’t get updated right now. However, if I do a query on this table for example select name from table where name like ‘a%’ it will be slow on the first run, and quick on subsequent runs. However, if I wait a few minutes and run some other queries (like ‘b%’, etc), and then rerun like ‘a%’ it seems it takes just as long.

Is this a sign that the key cache is insufficient and it’s being cleared out?

The char table has 4.4 million records and totals some 250MB for its index - 350MB total. My total key buffer is set at 1024MB.

Do not use a RAM drive for the index file!
Use the key_buffer_size parameter to specify how large the RAM cache for indexes should be.
The index is mostly situated in memory already due to this.
Combine this with the OS file cache which the rest (3GB) of RAM is used for it is not worth trying to

The query_cache is usually only really effective on tables that does not change so frequently.
As you said yourself changes invalidates the data in the query cache and it needs to perform the query from the beginning.

The query cache has a setting for which is the largest result set that it should store, query_cache_limit.
The reason for this is that storing just one large result set in a query cache instead of storing a lot of small ones is counter productive.
Most gain of query cache can be achieved for small queries since on small queries the overhead of parsing the query etc is quite large. On queries that return large result sets the time just to retrieve and send the data is so much larger so the time for parsing the query etc starts to be negligable.

Your purpose for this thread is a bit vague because you don’t really reveal what your goal is.
Are you running other queries against this database?
What is the server doing between the times you run this query?

Because as usual caching works it is the data that is used most resently that ends up in the cache.

Hi,

Thanks for the reply. I was not referring to the query_cache - I realize this gets invalidated. I was referring to the key buffer and I assume this gets dirty and needs to be reloaded from the drive in entirety if part of the index is updated (right? or only the updated parts?).

Initially, I was recording millions of user names and when they were last seen online and other such stats. I have split up the design so that the character part of the table is in it’s own table since the names can be repeated. Like I mentioned in the earlier post, the name database is 4 million records and it’s not being used right now. The table is a BIGINT id primary key and a unique index on a varchar 100 “name” field.

I can only reasonably assume the reason the same identical query is slow down the line is that the key buffer initially assigned to the user name table is freed up and given to other tables’s indexes using the least recently used algorithm. The database itself is busy processing 700QPS. So, it seems the only reasonable solution at this point is to up the DB’s 4GB ram and increase the key buffer - am I correct?

I am interested in the update for this issue??