I have been struggling with this issue for several months now and desperately need some help to figure out what is going on.
I have a dedicated single server setup with the specs below;
CentOS 6 64bit cpanel/WHM
Dual Intel Xeon E5-2620 - Hex Core 2.0GHz, 15MB Cache, HyperThreaded [12 cores, 24 threads]
2 x 240GB SSD
Varnish Full Page Cache (using Pound as an SSL terminator and ESI to inject dynamic content)
PHP 5.5 (PHP-FPM handler and Zend OpCache enabled)
[*]Magento 184.108.40.206 web app
I am running two Magento installs on this server, one is production and the other is for development/staging. The production site gets around 1,500 visits per day.
After a while of MySQL running I start to notice lag spikes when using the website, processlist shows a random query will seemingly get stuck in a “Writing to net” state for fairly long periods of time, anywhere from 10 seconds to sometimes over a minute. That query is rarely ever the same and runs in milliseconds from cli or mysql workbench.
I setup pt-stalk to collect server data and email me on collect triggered whenever a query sits in “Writing to net” for longer than 5 seconds. I am no sys admin but nothing stands out to me looking at the pt-stalk output during these collects. Admittedly, I don’t really know what I am looking at so I asked my host (web24) and even their senior system admin said that nothing stood out to him and eventually told me that I needed to get a DB sys admin to look into it as it is beyond their capability.
So now I am stuck with this odd issue where I need to restart mysql when I start to get flooded with pt-stalk collect triggered emails. Restarting mysql makes the issue go away for a while, sometimes I see the issue come back within hours, other times it may take 24 hours, I am guessing it may have something to do with the busyness of the server.
I have done everything I can to optimise the site and any queries that run but no matter what, eventually “Writing to net” strikes. When this issue occurs the server seems to have plenty of resources available, such as ram and cpu.
I feel as though something is causing the “network” to get bogged down, but on a single server setup like this were our services are all communicating via localhost I just don’t know where to look for clues.
my.cnf looks like this;
[mysql] # CLIENT # port = 3306 socket = /var/lib/mysql/mysql.sock [mysqld] # GENERAL # user = mysql default-storage-engine = InnoDB socket = /var/lib/mysql/mysql.sock #tmpdir = /mysqltmp # MyISAM # key_buffer_size = 64M myisam_recover_options = FORCE,BACKUP # SAFETY # max_allowed_packet=268435456 max_connect_errors = 1000000 skip-name-resolve innodb = FORCE # DATA STORAGE # datadir = /var/lib/mysql/ # BINARY LOGGING # #log-bin = /var/lib/mysql/mysql-bin #expire_logs_days = 14 #sync_binlog = 1 # CACHES AND LIMITS # wait_timeout = 300 query_cache_type = 0 query_cache_size = 0 max_connections = 300 key_buffer = 64M table_open_cache = 12000 join_buffer_size = 3M # INNODB # innodb_log_file_size = 256M # if changing, stop database, remove old log files, then start! innodb_file_per_table = 1 innodb_buffer_pool_size = 12G innodb_buffer_pool_instances = 12 # LOGGING # #log-error = /var/lib/mysql/mysql-error.log #log-queries-not-using-indexes = 1 #slow-query-log = 1 #long_query_time = 5 #slow_query_log_file = /var/lib/mysql/mysql-slow.log
Has anybody experienced this before? I am kind of desperate so any help would be mega appreciated.