Hey guys. I’ve got some large datasets at work and I’m constantly struggling with performance issues. I wonder if you guys can help me analyze my MySQL configuration and offer any suggestions.
Here’s my setup: We store firewall log data for each of our clients. We split the data across four tables in a one-to-one mapping – one table (named ‘event’) stores the information that is common to every event, and the other three tables (‘event_detail’, ‘event_ip_detail’, and ‘event_icmp_detail’) hold the data that applies only to some of the events. The decision so split the data up was made before I got here, to save hard disk space for the records that don’t need those extra columns, and now we have to deal with joining the data back up when we want to use it. Eventually I want to redesign the table schema and put it all in one table, but that’s not an option for now.
We’ve got seven database servers total, each with two Xeon processors (ranging between 2.0 and 3.2 ghz on different servers, all with HyperThreading enabled), 2.5GB RAM, and two 146GB 10k RPM SCSI disks joined together with LVM. All servers are running CentOS 4 (linux). The version of MySQL on every server is 4.1.20-standard-log.
We store data for multiple clients, and our data is partitioned on a client level. Each client’s data is stored in a separate database. Each database has the same four tables inside, using InnoDB file_per_table. The larger clients (200-300 GB each) get a server to themselves, and the smaller ones (10-70 GB each) share one server.
Each of the tables in the larger databases has on the order of about 300 million rows.
As far as read vs write usage (selects vs inserts), ours isn’t really clearly biased one way or another like it is in some cases (like webservers doing mostly selects). I suppose we do a little more inserting than selecting – we’re pretty much always inserting new events. We then have a series of scripts that run once an hour and once a day to calculate various statistics based on the data. Most of the rest of our usage is based on those calculated statistics, so that’s about the last time the data itself is accessed until we delete it 45 days later.
Here’s the relevant portions of my MySQL config for one of my servers. This one has 2GB of RAM and dual 3.2 ghz Xeons w/2MB cache.
skip-name-resolveskip-lockingset-variable = max_relay_log_size=256Mset-variable = key_buffer=384Mset-variable = max_allowed_packet=25Mset-variable = table_cache=128set-variable = sort_buffer=4Mset-variable = read_buffer_size=5Mset-variable = myisam_sort_buffer_size=16Mset-variable = tmp_table_size=768Minnodb_file_per_tableinnodb_data_file_path = ibdata1:100M:autoextendset-variable = innodb_log_files_in_group=3set-variable = innodb_log_file_size=250Mset-variable = innodb_log_buffer_size=16Mset-variable = innodb_buffer_pool_size=1536Mset-variable = innodb_additional_mem_pool_size=20Mset-variable = innodb_lock_wait_timeout=1000set-variable = innodb_flush_log_at_trx_commit=1
Any ideas? If there are any other stats that would be useful, let me know, and I can post them. Thanks a lot.
Dan