Master Slave replication

Hi there

I am here because of an application issue. To make it easy, the application is basically some sort of forum it creates and ID when someone draft a post and then it will read it.

The big deal is that we have one master and two slaves. The application will do all writes to the master and all reads to the slaves, just like unaware of replication. The case above cause issues because these two actions are done sequentially and when a lot of replication is occurring, the application will throw an obvious error that the data which were INSERT into the master are not yet available in the slave.

The delay is about 1 to 3 seconds and always varies. I know that the application should be replication aware for this case, but it is not what I am trying to solve. My first question is to find whether anything very obvious in my configuration could explain the delay.

The three servers are in the cloud and each with the exact same configuration. While I am not a fan of that, the latency and connectivity between each slave and the master is excellent (about 0.0200 ms latency average and 400 Mbit/s connectivity). Each server has 68 GB of RAM and 8 Intel(R) Xeon(R) CPU X5550 @ 2.67GHz CPU, running CentOS 5.5 x86_64 (LM flag activated) and MySQL Server is the 5.5.9-log. Each indicator clearly shows that the machine has plenty of free resources.

The configuration used is:

[mysqld]user=mysqllog_bin = mysql-binserver-id = 1000 # Each has a unique ID…#relay_log = mysql-relay-log # on slave only#log_slave_updates = 1 # on slave only#read_only = 1 # on slave onlybinlog_cache_size=1Mft_min_word_len=4innodb_additional_mem_pool_size = 32Minnodb_buffer_pool_size = 60Ginnodb_data_file_path = ibdata1:10M:autoextendinnodb_file_per_table = 1innodb_flush_log_at_trx_commit = 0innodb_log_buffer_size = 8Minnodb_log_file_size = 256Minnodb_thread_concurrency = 0innodb_status_file = 0innodb_flush_method = O_DIRECTinnodb_log_files_in_group = 2innodb_io_capacity = 10000innodb_write_io_threads = 4innodb_read_io_threads = 4join_buffer_size = 2Mmax_allowed_packet = 1Mmax_connections = 15000max_heap_table_size = 64Mmax_prepared_stmt_count = 16382query_cache_size = 0sort_buffer_size = 2048Ktable_cache = 2048thread_cache = 0thread_cache_size = 0thread_concurrency = 10thread_stack = 196Kquery_cache_size = 128Mtmp_table_size = 16Mtransaction_isolation = REPEATABLE-READlocal-infile = 1

This configuration has been heavily tuned and gave very good performances so far. Other than the replication delay (which does not really scares me more than that actually), do you see anything wrong? Anything that would make you yell?

Greg

w00t wrote on Tue, 15 March 2011 15:47

Nope, not for me at least.

As for the delay I would start by checking your IO stats since:

  1. You are allowing 60G for InnoDB and that only leaves 8G for OS and all connections so double check swapping.
  2. Replication passes through the replication log written to disk before written to DB.

And check that you don’t have a problem with that the replication is single threaded on the slave, while the inserts/updates coming in to the master are spread out over a lot of connections. But since you say it seems to recover and never pass 3 seconds it sounds as a less likely scenario.