Question: Is there a way to identify which tables / databases on a server are getting the majority of queries? Im looking to start replicating the busier tables in order to offload queries to secondary machines.
Situation: I work in a research group where we collect behavioral data from 300+ homes nationwide(US). The data comes in a variety of forms ranging from medication-schedule adherence to mouse movements to x10 sensor signals. We record ~2 billion records per year. The study will go for 5 years with probable extension.
Setup: dedicated MySQL server. 16 3Ghz cpu ‘cores’, 16Gb RAM, 1Tb RAID 6(?) for data plus an additional 3Tb for backups, logs, etc. Linux RH enterprise 2.6.18-92.1.22.el5. MySQL v5.1.20(beta)
30+ databases divided up (generally) by project. The largest of these has 200+ tables with 3+ billion records. Tables are (almost) exclusively innodb.
I have 2 other machines available with similar but slightly diminished capacity. Im looking to start replicating some of these tables so that I can move the majority of reads to other machines. The current db server is not CPU bound but the amount of IO wait is increasing over time.
Users are only allowed to write to one database as a scratch space. Most use temporary tables to store query data. The primary form of data access is via Matlab.