We are looking into migrating some of our MySQL servers to AWS EC2, however our initial tests are not too promising.
We tried 4 x EBS (2000 piops each) volumes on a EBS optimized instance, prewarmed the ebs volumes as instructed by Amazon, set up a Raid 0 array with mdadm, installed xfs, and restored a backup then configured the replication parameters. The slave is not able to catch up to the master.
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid0 xvdi[3] xvdh[2] xvdg[1] xvdf[0]
419429376 blocks super 1.2 256k chunks
unused devices:
iostat output:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.20 0.00 0.60 0.00 3.20 10.67 0.00 1.33 0.00 1.33 1.33 0.08
xvdf 0.00 0.00 1.20 496.20 19.20 16995.20 68.41 1.23 2.47 0.67 2.47 0.66 32.72
xvdg 0.00 0.00 1.20 404.20 19.20 16628.00 82.13 0.85 2.09 0.67 2.10 0.44 17.92
xvdh 0.00 0.00 1.00 427.20 16.00 16776.80 78.43 1.03 2.39 0.00 2.40 0.53 22.64
xvdi 0.00 0.20 0.60 493.80 9.60 16947.20 68.60 1.14 2.30 0.00 2.30 0.63 31.04
md0 0.00 0.00 4.00 1820.40 64.00 67310.40 73.86 0.00 0.00 0.00 0.00 0.00 0.00
Write throughput is 67310 kB/s, however iotop attributes much less to the mysql process:
Total DISK READ: 79.73 K/s | Total DISK WRITE: 64.24 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
5464 be/4 mysql 0.00 B/s 886.63 K/s 0.00 % 16.37 % mysqld --basedir=/usr --datadir=/var/lib/mys~ocket=/var/run/mysqld/mysqld.sock --port=3306
5465 be/4 mysql 54.22 K/s 173.82 K/s 0.00 % 3.18 % mysqld --basedir=/usr --datadir=/var/lib/mys~ocket=/var/run/mysqld/mysqld.sock --port=3306
5101 be/4 mysql 19.14 K/s 0.00 B/s 0.00 % 0.09 % mysqld --basedir=/usr --datadir=/var/lib/mys~ocket=/var/run/mysqld/mysqld.sock --port=3306
5100 be/4 mysql 6.38 K/s 11.96 K/s 0.00 % 0.07 % mysqld --basedir=/usr --datadir=/var/lib/mys~ocket=/var/run/mysqld/mysqld.sock --port=3306
I have no idea what is doing the remaining 63 M/s writes. Interestingly, when I stop the slave, the write throughput goes to 0. After slave start, it again exhibits the strange I/O pattern.
Next, I tried a high I/O instance that has an SSD. I only used one of the ephemeral SSD disks, and then MySQL was able to catch up to the master. It was doing 30 M/s until it caught up and sustained 10 M/s after that.
So I am suspicious of the EBS Raid 0 setup, but have no idea how to fix it. Has anyone else had this same issue?
Both master and slave are 5.5.30-30.1-log Percona Server (GPL), Release 30.1, OS is Ubuntu 12.04.