I’m currently running a mysql 5.6 in a master/slave setup on AWS EC2. Our database is roughly 4TB in size, growing by several GB per day. Our “big” tables are 400-500M rows, with our biggest being 1.3B rows. These big tables are receiving about 300k new rows per day.
Right now, both servers are in the same availability zone, and are running into slow queries causing locks on summary tables, leading to a pile up of connections and other ugly problems. My initial thought for the summary table issue is to move them to a separate database and allow writes to queue up to avoid issues from table locks. This solution carries a lot of overhead for us operationally and i’m not sure if it’s the right answer (we were going to move summary tables to amazon redshift, and do real-time reporting from there)
Moving the summary tables still doesn’t fix the glaring problem of databases being in the same zone, and of having no automated failover. We’re trying to get to a 99.99% (or higher) uptime. We can add web servers to scale web load, but right now, our database solution requires application level changes to query any additional slaves, as well as administrative overhead.