Partial replication

Hi

I want to create a reporting server to move huge load from transaction server into it. The problem is that there is so many insert/updates in transaction database that simple replication of entire database won’t give us much. I heard partly replication is possible so we could replicate only tables which are used by reports.

Here is big picture/use case of what I’m trying to acheive with mysql replication. Assume we have table “logs” and “summary”. Every hour about 2GB of data is imported into “logs” table. Every hour that data is summarized into “summary” table using some kind of “INSERT INTO summary … SELECT COUNT(*)… FROM logs WHERE …”. There are more tables like this one and they are summarized from many other tables using JOINS.
I’m wondering if it’s possible to replicate only the “summary” table on slave server? How it works in that situation if we will ignore logs table? Will mysql replication just send new and updated rows inside “summary” table to slave? Or is it neccessary that “logs” table have to exists on slave in order to summarize the data from it on slave as well?

Thanks in advance for any tips

Radek

Hi Radek,

MySQL Replication in MySQL 5.0 and below simply sends statements from master to the slaves (with exception of LOAD DATA INFILE) so if you do not replicate log table creation of summary table will fail on the slaves. Also it could be too expensive to run summary creation table on each of the slaves.

What you could do instead is do summary creation on the master to table which is not replicated and then do something like:

select * from tmp_summary into outfile ‘/tmp/dump.txt’;

load data infile ‘/tmp/dump.txt’ into table summary;

truncate table tmp_summary;

In such case summary will be updated just fine as you’re using load date infile.

Note you will need to delete ‘/tmp/dump.txt’ for this to work second time.

Hey Peter

Thanks for great idea! It sounds like it could be done. My only concern here is that it will involve changing existing maintenance process which summarize that data. I’m wondering if it would be possible to somehow force mysql to replicate that data after summarization is finished. Maybe by some kind of:

SELECT * FROM summary INTO OUTFILE ‘/tmp/dump.txt’ WHERE date BETWEEN {maintenance_start_hour} AND {maintenance_end_hour};
LOAD DATA INFILE ‘/tmp/dump.txt’ REPLACE INTO TABLE summary;
TRUNCATE TABLE tmp_summary;

*REPLACE is for situations when we need to rerun maintenance

I think in this way I will not have to even touch (refactor) existing summary process.

Do you know if there are any other options to do it? However it sounds for me that probably there is no better option.

Thanks!
Radek

Maybe you could somehow use the blackhole filter engine solution I wrote about very recently. Just have a look at this:
http://jroller.com/page/dschneller?entry=mysql_replication_u sing_blackhole_engine

Daniel

Thanks for your comment,

Blackhole engine is helpful with replication but in different case- it does not help INSERT … SELECT and similar cases when data is being processed as tables need to be on the same host as query is executing.

However blackhole engine is helpful in other cases as you’ve mentioned:

  • Filtering of logs, especially if you have partial replication over WAN

  • Mass distribution - you do not want to feed 50 slaves from loaded master directly.

Oh, I see. I misread the first post. I thought the summarizing process was taking place on an application level. In that case partial replication of only the summary tables (INSERTs of summarized values, coming from the application) would have worked and the blackhole engine would have prevented all the raw data inserts from being transported over the wire.

Right,

With summarization done on application level that would be helpful.
On other hand I would say simple replicate-do-wild-table or replicate-ignore-wild-table is often enough - if you replicate in the local network and do not have too many servers or very high write load log shipping to the slave might cause minimal overhead.

[B]Peter wrote on Fri, 11 August 2006 08:09[/B]
Hi Radek,

MySQL Replication in MySQL 5.0 and below simply sends statements from master to the slaves (with exception of LOAD DATA INFILE) so if you do not replicate log table creation of summary table will fail on the slaves. Also it could be too expensive to run summary creation table on each of the slaves.

What you could do instead is do summary creation on the master to table which is not replicated and then do something like:

select * from tmp_summary into outfile ‘/tmp/dump.txt’;

load data infile ‘/tmp/dump.txt’ into table summary;

truncate table tmp_summary;

In such case summary will be updated just fine as you’re using load date infile.

Note you will need to delete ‘/tmp/dump.txt’ for this to work second time.

And in MySQL 5.1, you can use row-based replication to filter (on slave) away everything execept the tables you are interested in.

Using row-based replication, the underlying log table is no longer needed on slave, since only data changes of the summary table is replicated instead of the SQL statements used to update the table.


Dr. Lars Thalmann
Replication and Clustering Technology
MySQL AB, www.mysql.com