13GB DB - Performance Question

GeoffreyF67 · May 23, 2007, 2:52pm

I have a 13 GB MYSQL table with the following definition:

CREATE TABLE WikiParagraphs ( ID int(10) unsigned NOT NULL auto_increment, Paragraph text NOT NULL, PRIMARY KEY (ID)) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=73035281 ;

I query it like this:

select Paragraph from WikiParagraphs where ID in (1,2,3,4)

the 1,2,3,4 bit comes from the Sphinx FullText engine that gives me the IDs I need that match my query within about 500 milliseconds.

But retrieving the data itself takes approximately 3-6 seconds.

Obviously, I’d like to speed this query up.

Any ideas?

sterin · May 23, 2007, 3:24pm

How much RAM memory do you have on the server and how much have you reserved for the InnoDB cache (innodb_buffer_size)?

Because this is very important to get speed with innodb.

Apart from that I have a suggestion:

select Paragraph from WikiParagraphs where id = 1UNIONselect Paragraph from WikiParagraphs where id = 4UNIONselect Paragraph from WikiParagraphs where id = 6

It will give const searches for each select instead of a range scan which might go faster if your smallest and biggest value in the IN() part is far apart and you don’t want to retrieve that many rows.

mysql> explain select Paragraph from WikiParagraphs where id = 3 → UNION → select Paragraph from WikiParagraphs where id = 4 → UNION → select Paragraph from WikiParagraphs where id = 6 → ;±—±-------------±---------------±------±--------------±--------±--------±------±-----±------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±-------------±---------------±------±--------------±--------±--------±------±-----±------+| 1 | PRIMARY | WikiParagraphs | const | PRIMARY | PRIMARY | 4 | const | 1 | || 2 | UNION | WikiParagraphs | const | PRIMARY | PRIMARY | 4 | const | 1 | || 3 | UNION | WikiParagraphs | const | PRIMARY | PRIMARY | 4 | const | 1 | ||NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |±—±-------------±---------------±------±--------------±--------±--------±------±-----±------+4 rows in set (0.00 sec)

GeoffreyF67 · May 23, 2007, 3:27pm

Well, I’m getting 1000 results back from Sphinx and I’m using those to get about 55k of data that I then massage.

I probably don’t need all 1000 results but I need at least 55k of data returned.

I tried doing single queries and I also tried OR = stuff but neither of those sped it up (slowed it down a bit).

How do I tell how much memory INNODB is using for this?

I should note that I also have MYISAM tables on the same server.

GeoffreyF67 · May 23, 2007, 3:28pm

Ooh. wait. Just noticed that UNION there.

That’s rather interesting. I would have never thought of doing that.

I’ll give that a shot and see how it performs.

sterin · May 23, 2007, 3:33pm

1000 rows was a bit more than I thought but I hope it works for you. )

To get the info about server settings you can use this query:

show variables like ‘innodb%’;

GeoffreyF67 · May 23, 2007, 3:36pm

Well, if it’s faster, I could break it down into chunks of 50 or 100 until I have enough data I guess…

±--------------------------------±------------------------------+| Variable_name | Value |±--------------------------------±------------------------------+| have_innodb | YES || innodb_additional_mem_pool_size | 12582912 || innodb_autoextend_increment | 8 || innodb_buffer_pool_awe_mem_mb | 0 || innodb_buffer_pool_size | 134217728 || innodb_data_file_path | innodb/ibdata1:10M:autoextend || innodb_data_home_dir | || innodb_fast_shutdown | ON || innodb_file_io_threads | 4 || innodb_file_per_table | OFF || innodb_flush_log_at_trx_commit | 1 || innodb_flush_method | || innodb_force_recovery | 0 || innodb_lock_wait_timeout | 30 || innodb_locks_unsafe_for_binlog | OFF || innodb_log_arch_dir | || innodb_log_archive | OFF || innodb_log_buffer_size | 8388608 || innodb_log_file_size | 33554432 || innodb_log_files_in_group | 2 || innodb_log_group_home_dir | ./ || innodb_max_dirty_pages_pct | 90 || innodb_max_purge_lag | 0 || innodb_mirrored_log_groups | 1 || innodb_open_files | 300 || innodb_table_locks | ON || innodb_thread_concurrency | 8 |±--------------------------------±------------------------------+

The server has 4GB of memory.

sterin · May 23, 2007, 4:08pm

It’s not faster if you break it into 50 or 100.
The only advantage is that you could break when you have got all data instead of reading in all 1000 rows.

But you are only using 134MB for InnoDB_buffer_size.

How much memory do you have available on the server?

And are you using the server for something else like webserver or something?

Because I would suggest that you increase the innodb_buffer_pool_size to 80% of available memory.

GeoffreyF67 · May 23, 2007, 4:10pm

Hmmm…

This server has 4GB of memory. I’m using as the MYISAM, INNODB and web server all in one - LOL. Too much for one server to do?

G-Man

sterin · May 23, 2007, 4:16pm

Do you have a reason for using both MyISAM and InnoDB?

That is, are you using both FULLTEXT and transactions?
Or have you run into the update/select performance problem with MyISAM?

Otherwise I would suggest that you choose to use only one storage engine and tune the server for that one.

GeoffreyF67 · May 23, 2007, 4:20pm

I’m actually not using transactions at all right now nor do I expect to be with this data.

The only reason I added innodb was because figured it would help to not lock the entire table when creating these 13-25gb tables. Now that they’re created the point is kinda moot but converting it back to myisam (reimporting the data) is likely to take a week or more unless you know a quick way to do it?

Fulltext is done via the sphinx engine so I’m not using that part of myisam either.

sterin · May 24, 2007, 1:41am

Are you performing a lot of updates against the table?
Or is it basically only selects?
I’m guessing selects since it looks like a wiki but you never know. )

This is the quickest way (but you don’t have any control over it):

ALTER TABLE yourTable ENGINE=MyISAM;

The only question is how long your server will take to convert it.

So that’s why I suggest that you create a smaller table with maybe 500MB data and test on that one instead so you can get a feel for how long it takes.

And depending on the time estimate from your test it could be better to perform the move yourself:
1.
Create a myisam copy of your table. Without indexes. Those will be added later.
2.
INSERT INTO yourMyISAMCopy SELECT * FROM yourInnoDBoriginal
3.
Create the indexes. And make sure that you give mysql a lot of memory for this step. Because create index can work in two ways and one is very slow when it as to work against disk all he time.

Step 2 could be split into portions by
… WHERE id BETWEEN 1001 AND 2000.
… WHERE id BETWEEN 2001 AND 3000.

Or some appropriate value for how many you want to move at a time if you can’t take all in one go.

bluesaga · May 24, 2007, 4:55am

I really wouldnt recommend the alter table, talking from experiance (50 gig table) it will be slow, and crummy!

I do queries similar to:
select Paragraph from WikiParagraphs where ID in (1,2,3,4, *1000)

and im looking at subsecond queries!

I think you should improve the buffers your applying to these tables sufficiently!

One thing that will help you, is do all the matching with sphinx, do any phpside ordering of the data and then only pull the data you actually need! This should help greatly, in terms of speed at least.

GeoffreyF67 · May 24, 2007, 11:30am

Well, since it’s a static table and my other tables are in myisam I’ve decided to move over to myisam.

The problem that I’m having tho is when I try this query:

insert into WikiParagraphs2 select * from WikiParagraphs

It drops about 40 million rows…

I tried a mysql dump but there are odd chars that make it barf.

Any ideas?

bluesaga · May 24, 2007, 11:55am

You could try forcing the charsets?

Are both tables in the same character set anyway?

Oh wait, are you SURE it dropped the rows? innoDB does not keep a record of the actual amount of rows in the table, so it may be that?

The only way (i can think of) is to do something similar to:

SELECT count(*) FROM innoDBtable

But that may take a long time, you could try two seperate count()'s from a boolean field (do a search for true and one for false) may be quicker.

GeoffreyF67 · May 24, 2007, 12:01pm

When I saw the number of rows not matching up, I did a count(*).

It took about 30 minutes to do that.

I’ve tried keeping the charsets the same as well as changing charsets.

Neither option worked.

I suppose I could write a php script to do this by hand but good lord that’ll be slowwwwwww )

bluesaga · May 24, 2007, 12:24pm

Honestly i cant see why it didnt work. It should of from what i can tell its a pretty simple query.

Some of the more proficient mysql dba’s here may be able to assist you more thouhg )

bluesaga · May 24, 2007, 12:26pm

ALSO, i might add i tested with MYISAM and INNODB seperately, and saw pretty much no difference between loadtimes selecting 1000 rows using SELECT … FROM … WHERE … IN (1,2 …1000)

Just giving you the heads up, id say its HIGHLY related to your buffer pool.

GeoffreyF67 · May 24, 2007, 12:39pm

[B]bluesaga wrote on Thu, 24 May 2007 13:56[/B]

ALSO, i might add i tested with MYISAM and INNODB seperately, and saw pretty much no difference between loadtimes selecting 1000 rows using SELECT ... FROM ... WHERE ... IN (1,2 ...1000)

Just giving you the heads up, id say its HIGHLY related to your buffer pool.

Yeah, I figure you’re right but having the memory eaten by myisam as well as innodb is kind of a waste. So I’m gonna switch to myisam across the board to make sure that I can optimize for one engine instead of two! )

carpii · May 24, 2007, 1:52pm

[B]sterin wrote on Wed, 23 May 2007 16:54[/B]

Apart from that I have a suggestion:

select Paragraph from WikiParagraphs where id = 1UNIONselect Paragraph from WikiParagraphs where id = 4UNIONselect Paragraph from WikiParagraphs where id = 6

Its better to use UNION ALL when you know the resultset will not contain any duplicates.
By default UNION implies DISTINCT (ie, it attempts to remove all duplicate rows from the resultset)

I doubt this would make much of a difference in this case, but I thought it was worth mentioning )

sterin · May 24, 2007, 5:01pm

What did you actually mean by “It drops about 40 million rows…”?

The INSERT INTO … SELECT * FROM …;

Is very straight forward and I have never experienced it to fail.
I might take a long time to perform it sometimes and that can be a problem sometimes. But that is why I suggested that you move it in portions where you select a range of primary keys that you transfer each time you perform the query.

Topic		Replies	Views
Out of memory when query big table Other MySQL® Questions	2	468	January 19, 2007
MyISAM vs InnoDB on Windows with limited memory Other MySQL® Questions	10	877	February 1, 2008
Selecting where id in (list of 1000 ids) Other MySQL® Questions	10	32280	June 25, 2008
MEMORY Table is slower than INNODB - Bug or Feature? Other MySQL® Questions	6	749	October 22, 2009
MySQL server optimization for HUGE table queries Other MySQL® Questions	9	713	March 16, 2009

13GB DB - Performance Question

Related topics