Distributing Write Load

Peter remarks in his blog that write scaling can be a serious problem regarding MySQL replication (especially replication delay) since the replication works single threaded which means only one CPU on the slave side may effectively process the incoming replication data which is a problem, if the master is a e.g. a 8 core system with 20 hard drives:

http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-m ysql-replication/

Now, we exactly ran into this problem recently while inserting 220 million rows of stock realtime data each day into our so called “intraday” tables - which means that the data is cleared every night and only a small amount of historic data is written to some other tables.

The beginning of each day is no problem since the intraday tables are empty and both INSERTs and UPDATESs are fast of course. As tables get filled the slave significantly falls back to a peak of 1 hour!! delay.

Both machines have identical hardware (4 disks RAID 10, dual core xeon, 8 GB RAM, mysql 4.0). Tables are InnoDB, manually partitioned by MOD’ing integer primary keys and quite small in row size:

Into these types of tables, bulk inserts are executed:

CREATE TABLE intraday_0 ( instruments_id int(10) unsigned NOT NULL default ‘0’, hour tinyint(2) unsigned zerofill NOT NULL default ‘00’, minute tinyint(2) unsigned zerofill NOT NULL default ‘00’, second tinyint(2) unsigned zerofill NOT NULL default ‘00’, bid float unsigned NOT NULL default ‘0’, KEY instruments_id (instruments_id,hour,minute)) TYPE=InnoDB;

These types of tables are constantly updated with the most recent values:

CREATE TABLE snapshot_0 ( instruments_id int(10) unsigned NOT NULL default ‘0’, date date NOT NULL default ‘0000-00-00’, bid float unsigned NOT NULL default ‘0’, bid_high float unsigned NOT NULL default ‘0’, bid_low float unsigned NOT NULL default ‘0’, bid_open float unsigned NOT NULL default ‘0’, bid_close float unsigned NOT NULL default ‘0’, bid_trend enum(‘up’,‘down’,‘inv’) NOT NULL default ‘inv’, bid_time time NOT NULL default ‘00:00:00’, ask float unsigned NOT NULL default ‘0’, ask_high float unsigned NOT NULL default ‘0’, ask_low float unsigned NOT NULL default ‘0’, ask_open float unsigned NOT NULL default ‘0’, ask_close float unsigned NOT NULL default ‘0’, ask_trend enum(‘up’,‘down’,‘inv’) NOT NULL default ‘inv’, ask_time time NOT NULL default ‘00:00:00’, last float unsigned NOT NULL default ‘0’, last_high float unsigned NOT NULL default ‘0’, last_low float unsigned NOT NULL default ‘0’, last_open float unsigned NOT NULL default ‘0’, last_close float unsigned NOT NULL default ‘0’, last_trend enum(‘up’,‘down’,‘inv’) NOT NULL default ‘inv’, last_time time NOT NULL default ‘00:00:00’, volume_last_unit bigint(20) unsigned NOT NULL default ‘0’, volume_cumulative_unit bigint(20) unsigned NOT NULL default ‘0’, volume_cumulative_price bigint(20) unsigned NOT NULL default ‘0’, PRIMARY KEY (instruments_id)) ENGINE=InnoDB;

InnoDB settings:

innodb_data_home_dir = /var/lib/mysqlinnodb_data_file_path = ibdata1:60Ginnodb_log_file_size = 2000Minnodb_log_buffer_size = 8Minnodb_buffer_pool_size = 1800Minnodb_additional_mem_pool_size = 16Minnodb_file_io_threads = 4innodb_lock_wait_timeout = 50innodb_flush_log_at_trx_commit = 0innodb_flush_method = O_DIRECT

To solve this problem I am considering the following two solutions:

  1. Set up a true master-master replication and distribute the write load among two servers: each server receives half of all inserts / updates
  • In this case the replication thread of both servers would only need to insert / update half of the data - am I right? )
  1. buy new machines with 16 RAID 10 disks
  • although Peter remarked that more disks and cpu’s won’t help, I think more disks working in a RAID 10 will certainly do. of course only one cpu will work on the replication thread, but data will be inserted / updated much faster with 16 RAID 10 disks instead of 4 - am I right? )

Thanks for any further suggestions about distributing write load or enhancing write speed in general.

I would recommend against option 2 also.
The speed increase that you can expect is two small to make any significant impact on your problem.

But if you are already prepared to write to both servers with half the data to each server and then replicate between them I recommend an alternative to option 1 for you instead.

Why not skip replication altogether and write all data to each server instead?
That way you can eliminate the mysql replication bottleneck and “replication” will be handled externally of mysql.

Yes you will have to handle the copying and writing to dual servers yourself but a master - master replication is not a big hit either.
And since the servers should be of equal performance and fast enough you should never have a backlog of writes and they should both be up to date.

Thanks for your answer!

I was already thinking about the option you propose, the problem is, in our current setup we only have a master-slave replication which is necessary to send all SELECTs to the slave since a single master cannot handle both INSERTs and SELECTs at the same time (we did some benchmarks regarding this issue). So if we stay with the current hardware and WRITE & READ from both servers simultanously, they won’t be able to handle the amount of queries. In this case we definetly need a hardware upgrade.

So you think the performance increase will not be significant?

Currently we have 4 RAID 10 disks which results in about 2 * 200 seeks / second (random IO) = 400 rows / second

in the case of 14 RAID 10 disks this is 7 * 200 seeks / second =
1400 rows / second

increase = 250 % (theoretically of course)

so my question is basically if a single cpu core (from the 8 cores available) will be able to benefit significantly from the IO increase using 14 disks instead of 4


I was just thinking about another interesting idea: In the MySQL manual I found:

[B]Quote:[/B]

For replication, the master server sends the events contained in its binary log to its slaves, which execute those events to make the same data changes that were made on the master.

[URL="http://dev.mysql.com/doc/refman/5.0/en/stored-procedure-logging.html"] http://dev.mysql.com/doc/refman/5.0/en/stored-procedure-logg ing.html[/URL]

Does this basically mean, that the updates or inserts which are done by a trigger are NOT replicated directly? Instead only the event itself is replicated?

If this is the case, we could use some triggers which do the updates we require in the snapshot tables and this would take away a huge amount of the replication thread bandwith. Is anybody experienced regarding triggers and replication?

Yes one master can’t handle all writes and reads.
But can one master handle all writes and half the reads?

Because no matter how you twist and turn this both servers must at least handle all writes otherwise it will get behind.

But when it comes to reading you can split that load out on as many servers as you want in a round robin fashion.

I would like to know a bit more about your hardware and how it looks like under load.
What do the CPU states look like on the slave?

How much RAM do you have installed and how much is dedicated to the innodb index/data cache?

You can find the innodb settings and the hardware in my first post.

Unfortunately I cannot send you the data of the slave as we currently disabled the slave for further benchmarking and execute all inserts / selects on the master.

system performance
iostat -xtc 5 2

Linux 2.6.16-2-amd64-k8-smp (inst01) 08/22/2007avg-cpu: %user %nice %sys %iowait %idle 28.01 0.00 12.72 5.44 53.83Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtnsda 78.76 1372.89 1499.65 1731577431 1891450408avg-cpu: %user %nice %sys %iowait %idle 39.04 0.00 20.72 22.52 17.72Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtnsda 263.07 2142.91 15920.16 10736 79760

dstat -am (30 seconds nip)

----total-cpu-usage---- -disk/total -net/total- —paging-- —system-- —load-avg— ------memory-usage-----usr sys idl wai hiq siq|_read write|_recv _send|in out|int csw|1m 5m 15m|_used _buff _cach _free 28 10 54 5 1 2| 0 0 | 0 0 | 0 0 | 0 0 |7.63 6.67 6.35|5261M 46.0M 2685M 47.8M 55 15 9 15 1 4|2036k 5436k| 0 0 | 0 0 |4930 11.7k|7.63 6.67 6.35|5267M 45.9M 2681M 46.1M 63 18 7 8 2 4|1660k 4248k|2444k 1217k| 0 0 |4999 11.6k|7.26 6.61 6.33|5260M 45.9M 2686M 47.7M 59 16 7 10 1 6|1692k 2880k| 0 0 | 0 0 |5097 11.1k|7.26 6.61 6.33|5263M 45.8M 2677M 54.5M 50 16 9 18 1 5|1608k 10.3M|2829k 1553k| 0 0 |5600 12.1k|7.26 6.61 6.33|5260M 46.1M 2682M 52.5M 64 16 5 8 1 4|1408k 1364k| 0 0 | 0 0 |5838 13.6k|7.26 6.61 6.33|5260M 46.1M 2684M 49.2M 65 18 8 3 1 6|1004k 1684k|3246k 1623k| 0 0 |6500 15.0k|7.26 6.61 6.33|5262M 46.1M 2684M 48.5M 53 25 10 4 1 6| 764k 5108k| 0 0 | 0 0 |6326 14.9k|7.16 6.60 6.33|5262M 46.1M 2683M 48.2M 58 22 8 4 1 7|1368k 1400k|3938k 1974k| 0 0 |6299 14.5k|7.16 6.60 6.33|5264M 46.1M 2679M 51.2M 55 22 8 7 2 7| 752k 8892k| 0 0 | 0 0 |5971 11.8k|7.16 6.60 6.33|5265M 46.2M 2677M 51.5M 58 35 4 0 0 2| 260k 196k|3208k 1783k| 0 0 |3094 3590 |7.16 6.60 6.33|5267M 46.2M 2678M 49.0M 60 35 4 0 1 1| 324k 240k| 0 0 | 0 0 |3272 4146 |7.16 6.60 6.33|5261M 46.2M 2678M 54.5M 69 25 3 0 0 2| 0 4408k|1625k 958k| 0 0 |3254 4160 |7.07 6.59 6.33|5270M 46.2M 2670M 53.4M 62 27 5 3 2 3|2156k 3904k| 0 0 | 0 0 |3712 4786 |7.07 6.59 6.33|5261M 46.2M 2673M 60.0M 46 16 29 8 0 2| 352k 1876k|1722k 1045k| 0 0 |3430 4497 |7.07 6.59 6.33|5262M 46.3M 2674M 57.4M 40 9 18 27 1 4|2012k 8248k| 0 0 | 0 0 |4008 9271 |7.07 6.59 6.33|5259M 46.3M 2681M 53.0M 38 15 18 24 0 4|1948k 5588k|2138k 973k| 0 0 |4010 8960 |7.07 6.59 6.33|5261M 46.3M 2686M 47.4M 44 19 12 20 0 6|1444k 6844k| 0 0 | 0 0 |4473 9661 |7.46 6.68 6.36|5260M 46.3M 2685M 49.0M 35 11 5 44 2 4|1160k 3516k|2540k 1242k| 0 0 |4679 9729 |7.46 6.68 6.36|5267M 46.2M 2680M 46.8M 38 17 15 25 1 5|1292k 8148k| 0 0 | 0 0 |5726 12.4k|7.46 6.68 6.36|5261M 46.1M 2677M 56.1M 50 15 12 14 1 6|1156k 9220k|2810k 1432k| 0 0 |5453 12.2k|7.46 6.68 6.36|5262M 46.1M 2682M 50.2M 42 12 19 21 1 4|1304k 5076k| 0 0 | 0 0 |4855 10.6k|7.46 6.68 6.36|5260M 46.1M 2680M 53.3M 55 18 9 9 2 8|1324k 5396k|3039k 1586k| 0 0 |6110 12.6k|7.10 6.62 6.34|5260M 46.1M 2686M 48.2M 42 15 25 13 1 4|1428k 9008k| 0 0 | 0 0 |5027 9897 |7.10 6.62 6.34|5260M 46.1M 2687M 47.2M 51 20 16 4 1 8| 772k 1152k|3386k 1751k| 0 0 |6265 13.3k|7.10 6.62 6.34|5260M 46.1M 2685M 48.9M 45 21 11 16 1 5| 516k 11.5M| 0 0 | 0 0 |5637 11.3k|7.10 6.62 6.34|5262M 46.1M 2683M 49.3M 46 18 16 11 1 8| 932k 3904k|3547k 1830k| 0 0 |6086 12.7k|7.10 6.62 6.34|5262M 46.1M 2687M 45.1M 50 21 13 11 1 5|1700k 3152k| 0 0 | 0 0 |6017 13.8k|6.94 6.59 6.33|5261M 46.1M 2685M 48.4M 51 22 9 10 1 6| 916k 3132k|3466k 1747k| 0 0 |6504 13.2k|6.94 6.59 6.33|5263M 46.1M 2678M 52.5M 50 22 13 7 2 7|1144k 5072k| 0 0 | 0 0 |6895 14.6k|6.94 6.59 6.33|5260M 46.1M 2684M 49.9M 39 17 11 25 2 6| 664k 12.6M|3811k 1870k| 0 0 |5374 11.3k|6.94 6.59 6.33|5262M 46.1M 2684M 48.8M 33 13 36 15 0 5|1468k 7836k| 0 0 | 0 0 |3793 8498 |6.94 6.59 6.33|5265M 46.1M 2681M 47.9M 42 20 26 6 1 4| 752k 5456k|2425k 1200k| 0 0 |4690 10.2k|6.86 6.58 6.33|5262M 45.8M 2680M 52.0M 46 15 15 18 0 6|2460k 4996k| 0 0 | 0 0 |5224 11.9k|6.86 6.58 6.33|5265M 45.8M 2680M 49.4M 51 24 8 11 2 5|1928k 5036k|2502k 1212k| 0 0 |5652 12.5k|6.86 6.58 6.33|5261M 45.7M 2680M 53.3M 39 17 18 18 1 7|1508k 11.3M| 0 0 | 0 0 |5379 12.3k|6.86 6.58 6.33|5261M 45.8M 2686M 46.8M 44 23 19 5 2 8| 320k 5648k|3482k 1821k| 0 0 |6582 14.4k|6.86 6.58 6.33|5261M 45.7M 2677M 56.9M

As you can see, we have lots of IO waits which indicates that our disks are the bottleneck on a single server solution.

If you wonder why we only have

innodb_buffer_pool_size = 1800M

The current mysql version combined with our current kernel segfaults if we set the buffer pool size higher than 1.9 GB

any other info you need?

Aha, how large is your DB on disk?

Because I would suggest that you start with testing and running a newer version of mysql to fix that segfault problem that limits you to 1800MB innodb_buffer_pool_size so that you can increase the buffer and avoid the disk seeks altogether.

Since you with present RAM installed can get more than double the size of the innodb cache and with the possibility to add more RAM later I would say that this will give you more benefit than adding more disks (famous last words :wink: ).

DBMS rule 1A: The more RAM the merrier. )

As an example:
One of the servers I’m working with has:
8G RAM
mysql 5.0.27
innodb_buffer_pool_size = 5G
2.6.9-55.ELsmp #1 SMP Wed May 2 14:04:42 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
4 Intel(R) Xeon(R) CPU X5355 @ 2.66GHz

And it currently averages 1400 qps (30% write queries) at about 6-7% CPU and 0% IO wait.
Of course this is mostly due to the fact that we are able to fit the entire DB into the 5G InnoDB cache.

Well yes, I know about this rule and you are completely right )

The problem is, our database on disk is around 20 GB (innodb) and we also have some myisam tables (around 5 GB) so we still need the key buffer. In our scenario the innodb tables get “refilled” every day, so in my opinion a larger innodb buffer size would just delay the IO problems to a later point in time of the day - this is exactly the time when the buffer is full and mysql is starting to kick indices and data out of the buffer…

Well, I’m gonna buy some 64 GB RAM machines then :wink:

Thanks for your time and answers! The first thing I’ll do is certainly a MySql upgrade…