Intermintent slow MySQL Connections

Hello,

We keep getting errors like this on our PHP sites.

Can’t connect to MySQL server on ‘192.168.100.85’ (4)

web is the web server 192.168.1.116
mysql is the mysql server 192.168.100.85

So I built a script on the web server to make 10,000 mysql connections in a row and time them. It will “sometimes” produce the error. Most times everything runs fine and there is an average time of 5-10 ms for the call to mysql_connect

Some serious Googling showed that the (4) error is due to the connection being cut off by the timeout.

web# grep mysql.connect_timeout /etc/php.ini mysql.connect_timeout = 1

So I modified the timeout in the script to 30 to see if it would work. The connection errors went away, but occasionally the connection would take 5 seconds.

After more Googling and some tcpdump I found that occasionally when the MySQL server is doing its reverse lookup of the IP the DNS server would fail to respond. So after 5 seconds it would give up and allow the connection.

I have since added skip-name-resolve to the server. But this did not solve the problem.

Now my test would show slow connections taking 3-4.5 seconds instead of the set 5 with the DNS issue.

So I ran my tests again with tcpdump running on both ends.

web# tcpdump -n -s 65535 -w web3-$(date +“%F_%H-%M-%S”).pcap host 192.168.100.85 and port 3306 mysql# tcpdump -n -s 65535 -w master1-$(date +“%F_%H-%M-%S”).pcap host 192.168.1.116 and port 3306

Here is the packets from the relevant slow connection.

Packets from web:

No. Time Source Destination Protocol Info 13312 2010-10-13 10:01:01.201965 192.168.1.116 192.168.100.85 TCP 41560 > mysql [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=904829062 TSER=0 WS=2 13316 2010-10-13 10:01:04.201577 192.168.1.116 192.168.100.85 TCP 41560 > mysql [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=904832062 TSER=0 WS=2 13317 2010-10-13 10:01:04.204837 192.168.100.85 192.168.1.116 TCP mysql > 41560 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1380 TSV=562240314 TSER=904832062 WS=7 13318 2010-10-13 10:01:04.204853 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSV=904832065 TSER=562240314 13319 2010-10-13 10:01:04.205886 192.168.100.85 192.168.1.116 MySQL Server Greeting proto=10 version=5.0.77-log 13320 2010-10-13 10:01:04.205899 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=1 Ack=61 Win=5840 Len=0 TSV=904832066 TSER=562240316 13321 2010-10-13 10:01:04.205959 192.168.1.116 192.168.100.85 MySQL Login Request userexample 13322 2010-10-13 10:01:04.206800 192.168.100.85 192.168.1.116 TCP mysql > 41560 [ACK] Seq=61 Ack=71 Win=5888 Len=0 TSV=562240317 TSER=904832066 13323 2010-10-13 10:01:04.206874 192.168.100.85 192.168.1.116 MySQL Response OK 13324 2010-10-13 10:01:04.208823 192.168.1.116 192.168.100.85 MySQL Request Quit 13325 2010-10-13 10:01:04.208839 192.168.1.116 192.168.100.85 TCP 41560 > mysql [FIN, ACK] Seq=76 Ack=72 Win=5840 Len=0 TSV=904832069 TSER=562240317 13326 2010-10-13 10:01:04.210422 192.168.100.85 192.168.1.116 TCP mysql > 41560 [FIN, ACK] Seq=72 Ack=76 Win=5888 Len=0 TSV=562240320 TSER=904832069 13327 2010-10-13 10:01:04.210437 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=77 Ack=73 Win=5840 Len=0 TSV=904832071 TSER=562240320 13328 2010-10-13 10:01:04.210567 192.168.100.85 192.168.1.116 TCP mysql > 41560 [ACK] Seq=73 Ack=77 Win=5888 Len=0 TSV=562240320 TSER=904832069

Packets from mysql:

No. Time Source Destination Protocol Info 13315 2010-10-13 10:01:04.204817 192.168.1.116 192.168.100.85 TCP 41560 > mysql [SYN] Seq=0 Win=5840 Len=0 MSS=1380 TSV=904832062 TSER=0 WS=2 13316 2010-10-13 10:01:04.204836 192.168.100.85 192.168.1.116 TCP mysql > 41560 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=562240314 TSER=904832062 WS=7 13317 2010-10-13 10:01:04.206611 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSV=904832065 TSER=562240314 13318 2010-10-13 10:01:04.206808 192.168.100.85 192.168.1.116 MySQL Server Greeting proto=10 version=5.0.77-log 13319 2010-10-13 10:01:04.207658 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=1 Ack=61 Win=5840 Len=0 TSV=904832066 TSER=562240316 13320 2010-10-13 10:01:04.207815 192.168.1.116 192.168.100.85 MySQL Login Request user=example 13321 2010-10-13 10:01:04.207872 192.168.100.85 192.168.1.116 TCP mysql > 41560 [ACK] Seq=61 Ack=71 Win=5888 Len=0 TSV=562240317 TSER=904832066 13322 2010-10-13 10:01:04.207910 192.168.100.85 192.168.1.116 MySQL Response OK 13323 2010-10-13 10:01:04.210817 192.168.1.116 192.168.100.85 MySQL Request Quit 13324 2010-10-13 10:01:04.210849 192.168.100.85 192.168.1.116 TCP mysql > 41560 [FIN, ACK] Seq=72 Ack=76 Win=5888 Len=0 TSV=562240320 TSER=904832069 13325 2010-10-13 10:01:04.211632 192.168.1.116 192.168.100.85 TCP 41560 > mysql [FIN, ACK] Seq=76 Ack=72 Win=5840 Len=0 TSV=904832069 TSER=562240317 13326 2010-10-13 10:01:04.211640 192.168.100.85 192.168.1.116 TCP mysql > 41560 [ACK] Seq=73 Ack=77 Win=5888 Len=0 TSV=562240320 TSER=904832069 13327 2010-10-13 10:01:04.213243 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=77 Ack=73 Win=5840 Len=0 TSV=904832071 TSER=562240320

As you can see, web re-sent the initial ACK after 3 seconds of no response. But mysql never even saw that initial packet.

I also tried to run a ping flood to check for dropped packets. If you leave it running long enough you will get dropped packets.

web3# ping -f 192.168.100.85PING 192.168.100.85 (192.168.100.85) 56(84) bytes of data…— 192.168.100.85 ping statistics —38253 packets transmitted, 38185 received, 0% packet loss, time 460851msrtt min/avg/max/mdev = 0.880/3.430/66.904/8.015 ms, pipe 7, ipg/ewma 12.047/1.378 ms

This issue is very intermittent but is happening all the time. I do understand that simply increasing our timeout would help greatly, but I’d rather have the connects always be fast and never have a 3-4 second delay in serving a page.

I contacted our hosting provider and they say its likely a result of the bad interaction between Nagle’s Algorithm and Delayed ACK. <[URL]http://www.stuartcheshire.org/papers/NagleDelayedAck/[/URL]>

It seems to me like packets are being dropped. Any ideas on a way I could better prove it to the hosting guys? Ping dropping 60 out of 38,000 packets does seem kind of minor. Is this something I should just live with?

Thank you for your time in looking into this!

Did you check if either the conntrack or kernel cookies are not the issue here ? On mysql server ofc )

conntrack is part of iptables correct? If so, iptables is off on both servers and shouldn’t be a problem.

I don’t even know where to begin checking if “kernel cookies” is an issue. Can you point me in the right direction?

Did you check for any evidence in mysql error log ?

Than maybe it is some network related problem ? ifconfig RX/TX packets errors, dropped ?
Also you might need to look at application server for similiar conntrack, cookies, network problems…

I got a hold of someone else at the hosting company. They seem to think our firewall it underpowered and is dropping packets during peak times. I will keep working with them to get this resolved.

Thanks for your help!

Sorry i can’t help you because i does not understand your problem.But i hope some one will probably help you soon. :slight_smile: