Online Index Rebuild

Hi,

I know that MySQL drops-and-creates the table when doing an alter table of any kind.
Also, i know there is no embedded way to drop and create an index online, to rebuild it.

Is there a method someone here uses that can duplicate this behavior in a reliable way? Meaning, create another table, somehow copy diffs there, and then apply them, without the application being aware of that? I’ve googled and searched the forum, can’t find anything about this.

Thanks in advance,
Shlomo

It’s a bit hard to understand what you are after.
You can drop an index and re-create it only the table is locked for writes during this time.

As for your second table tactic I don’t really get what you are after.

You could do the changes on a dev server and then use this tool I created to replace the table live.

[URL]Google Code Archive - Long-term storage for Google Code Project Hosting.

I’ll elaborate, sorry if this was a bit foggy )

Lets say I’ve got a table with millions of records, and around 20-50 GB in size.
More importantly, it needs to be always on-line (or at least, 99.999% of the time).
The table receives a lot of writes as well as reads.
Regular maintenance on this type of table is nearly impossible since an index rebuild/optimize locks the entire table for a too long time.

In databases such as SQL Server or Oracle there is an option to rebuild index “ONLINE” so that the table is not locked and available as usual.

What “tricks” are there in MySQL to do maintenance such as rebuilds, without stopping the regular flow of work in the application?

Thanks again,
Shlomo

This a cool tool, but i’m afraid it’s not suitable for me since:

  1. We’re using InnoDB (it’s moving .MYI and .MYD files around)
  2. The table is not read-only

Am I missing something?

Nope, didn’t find something “ready” to do this.
What we’re going to do, is to do this manually - with the application being aware of the process.

alright, thanks for the quick response…

What you can do sometimes is use the fact that renaming a table is a very fast operation.

So by using a backup copy (to further minimize downtime) you can create a copy of your table. Do the alteration on that copy and then do a fast rename:

RENAME TABLE old_table TO backup_table, new_table TO old_table;

But if you have a lot of INSERT/UPDATEs then you will have to handle the ones that happens between you take the dump and when you switch. By for example replaying the INSERT/UPDATE queries that has taken place during this time.

In any way there is (as shlomoid noted) no ready feature for this scenario.

Good luck with your alterations. )

This is actually a good option (and the only one possible) - and is also sort of what I had in mind by “manually”. The “RENAME” is very fast and more importantly, atomic.

Right now I’m looking for a way to do this using partitions… And I still can’t figure out how, since many of the basic features in partitioning that one would expect are still not available. I can’t add a table as a partition, I can’t replace a partition with another table…

Maybe you have any ideas for a “trick” that can do the same magic? )

(The only option i see at this moment, is to implement partitions in the application itself…)

The only way that comes to my mind to solve this with as little downtime as possible is that you create the secondary copy of the table from your backup.
And then you write a small script that uses the output from the bin-log (replacing the table name with the name of the secondary table) to continuously write the changes to the secondary table.
And when the changes to the secondary table has caught up with the main table you temporarily place a LOCK on the main table perform the RENAME and the switch is ready.
The tricky part is the re-feeding of the bin-log but I don’t see any other way if you want to minimize downtime as much as possible. Just remember to replace the original table name with the name of the secondary copy of the table. Otherwise you have invented the new Perpetuum Mobile. :wink:

The read from bin-log is a great idea - it might save the application from being aware from what is happening. I can just do everything in the background. Hmm.

The bin-log part is indeed complex since it will require filtering of the statements to find out the ones that are relevant and need to be replayed. Never tried anything like this before, good time to learn. I’ll report here if I make anything out of it eventually.