Intermittent random MySQL queries stuck on "Writing to net" state

Hi,

I have been struggling with this issue for several months now and desperately need some help to figure out what is going on.

I have a dedicated single server setup with the specs below;

[LIST]
[]CentOS 6 64bit cpanel/WHM
[
]Dual Intel Xeon E5-2620 - Hex Core 2.0GHz, 15MB Cache, HyperThreaded [12 cores, 24 threads]
[]24GB RAM
[
]2 x 240GB SSD
[]Varnish Full Page Cache (using Pound as an SSL terminator and ESI to inject dynamic content)
[
]Apache 2.4
[]PHP 5.5 (PHP-FPM handler and Zend OpCache enabled)
[
]mysql 5.6
[*]Magento 1.9.1.0 web app
[/LIST]
I am running two Magento installs on this server, one is production and the other is for development/staging. The production site gets around 1,500 visits per day.

After a while of MySQL running I start to notice lag spikes when using the website, processlist shows a random query will seemingly get stuck in a “Writing to net” state for fairly long periods of time, anywhere from 10 seconds to sometimes over a minute. That query is rarely ever the same and runs in milliseconds from cli or mysql workbench.

I setup pt-stalk to collect server data and email me on collect triggered whenever a query sits in “Writing to net” for longer than 5 seconds. I am no sys admin but nothing stands out to me looking at the pt-stalk output during these collects. Admittedly, I don’t really know what I am looking at so I asked my host (web24) and even their senior system admin said that nothing stood out to him and eventually told me that I needed to get a DB sys admin to look into it as it is beyond their capability.

So now I am stuck with this odd issue where I need to restart mysql when I start to get flooded with pt-stalk collect triggered emails. Restarting mysql makes the issue go away for a while, sometimes I see the issue come back within hours, other times it may take 24 hours, I am guessing it may have something to do with the busyness of the server.

I have done everything I can to optimise the site and any queries that run but no matter what, eventually “Writing to net” strikes. When this issue occurs the server seems to have plenty of resources available, such as ram and cpu.

I feel as though something is causing the “network” to get bogged down, but on a single server setup like this were our services are all communicating via localhost I just don’t know where to look for clues.

my.cnf looks like this;

[mysql]

# CLIENT #
port = 3306
socket = /var/lib/mysql/mysql.sock


[mysqld]

# GENERAL #
user = mysql
default-storage-engine = InnoDB
socket = /var/lib/mysql/mysql.sock
#tmpdir = /mysqltmp

# MyISAM #
key_buffer_size = 64M
myisam_recover_options = FORCE,BACKUP

# SAFETY #
max_allowed_packet=268435456
max_connect_errors = 1000000
skip-name-resolve
innodb = FORCE

# DATA STORAGE #
datadir = /var/lib/mysql/

# BINARY LOGGING #
#log-bin = /var/lib/mysql/mysql-bin
#expire_logs_days = 14
#sync_binlog = 1

# CACHES AND LIMITS #
wait_timeout = 300
query_cache_type = 0
query_cache_size = 0
max_connections = 300
key_buffer = 64M
table_open_cache = 12000
join_buffer_size = 3M

# INNODB #
innodb_log_file_size = 256M # if changing, stop database, remove old log files, then start!
innodb_file_per_table = 1
innodb_buffer_pool_size = 12G
innodb_buffer_pool_instances = 12

# LOGGING #
#log-error = /var/lib/mysql/mysql-error.log
#log-queries-not-using-indexes = 1
#slow-query-log = 1
#long_query_time = 5
#slow_query_log_file = /var/lib/mysql/mysql-slow.log

Has anybody experienced this before? I am kind of desperate so any help would be mega appreciated.

What happens if you restore the maximum packet size to the default value? 256MB seems inappropriately high (although the memory is not allocated).

Besides, you should not mess with join_buffer_size, unless you know exactly what you’re doing. If some queries are too slow on joins, generally it’s better to work out the given queries.

Hello Wonderful People,

Even I am facing similar issue something like this. We have recently migrated our environment to cloud and since then few queries stay in the writing to net state for more than 2500 seconds.

I tried to tweak my net_read_timeout variable and net_write_timeout variable. But this is of no use.

I checked my kernel version which was 2.6.32-358.el6.x86_64 and our os is CentOS release 6.4 (Final) … Is the kernel causing this issue?

Any help would be greatly appreciated. Thank you