Issue restarting server under specific conditions

Hi,

We’ve got an old backup script which runs once a day and does a hot backup of one of our production slaves. It could do with an update using LVM snapshots, but for the moment it works like:
[LIST]
[]STOP SLAVE;
[
]
[]Server shuts down
[
]
[]Ensure server is shutdown, then copy datadir to backup server
[
]
[]Start Percona
[
]
[]Ensure Slave has started again and caught up to master
[/LIST] Basically, I’ve had an issue approximately once every 1 - 2 months (30 - 60 restart/backup runs) where Percona refuses to start again because it can’t bind to the port 3306 (“Address already in use”). I’ve checked everything I can think of within the system, ensuring nothing is holding 3306 open (netstat/lsof), and that no other mysqld processes start or have been started over the period. But it refuses to start regardless, unless I reboot the machine completely.
[INDENT]2014-01-19 22:50:39 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2014-01-19 22:50:39 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2014-01-19 22:50:39 10662 [Note] InnoDB: The InnoDB memory heap is disabled
2014-01-19 22:50:39 10662 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2014-01-19 22:50:39 10662 [Note] InnoDB: Compressed tables use zlib 1.2.3
2014-01-19 22:50:39 10662 [Note] InnoDB: Using Linux native AIO
2014-01-19 22:50:39 10662 [Note] InnoDB: Using CPU crc32 instructions
2014-01-19 22:50:39 10662 [Note] InnoDB: Initializing buffer pool, size = 90.0G
2014-01-19 22:50:44 10662 [Note] InnoDB: Completed initialization of buffer pool
2014-01-19 22:50:44 10662 [Note] InnoDB: Highest supported file format is Barracuda.
2014-01-19 22:50:47 10662 [Note] InnoDB: 128 rollback segment(s) are active.
2014-01-19 22:50:47 10662 [Note] InnoDB: Waiting for purge to start
2014-01-19 22:50:47 10662 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.15-63.0 started; log sequence number 2854995318219
2014-01-19 22:50:47 10662 [Note] RSA private key file not found: /var/lib/mysql/private_key.pem. Some authentication plugins will not work.
2014-01-19 22:50:47 10662 [Note] RSA public key file not found: /var/lib/mysql/public_key.pem. Some authentication plugins will not work.
2014-01-19 22:50:47 10662 [Note] Server hostname (bind-address): '
'; port: 3306
2014-01-19 22:50:47 10662 [Note] IPv6 is available.
2014-01-19 22:50:47 10662 [Note] - ‘::’ resolves to ‘::’;
2014-01-19 22:50:47 10662 [Note] Server socket created on IP: ‘::’.
2014-01-19 22:50:47 10662 [ERROR] Can’t start server: Bind on TCP/IP port: Address already in use
2014-01-19 22:50:47 10662 [ERROR] Do you already have another mysqld server running on port: 3306 ?
2014-01-19 22:50:47 10662 [ERROR] Aborting[/INDENT]

Once again, it’s fine for months in a row, then randomly shows this behaviour after a large number of restart cycles.

Any idea?

Thanks in advance!

Hi,

Yesterday, I also faced the same issue once. I have checked but there is no any known bug for this. Even this thing is not repeatable every time so its quite difficult to submit bug. Is it possible for you to provide test case when next time it happened? I mean to say collecting all information like output of netstat/lsof/ps -aux | grep mysql etc?

Ah! Found it, I was simply checking listening ports (lsof -Pi | grep LISTEN, netstat -plunt), not actually existing connections that may have still been open
[INDENT][I][root@patchdb05 ~] $ lsof -Pi | grep LISTEN
rpcbind 2452 rpc 8u IPv4 15836 0t0 TCP *:111 (LISTEN)
rpcbind 2452 rpc 11u IPv6 15841 0t0 TCP *:111 (LISTEN)
rpc.statd 2470 rpcuser 9u IPv4 15926 0t0 TCP *:5943 (LISTEN)
rpc.statd 2470 rpcuser 11u IPv6 15934 0t0 TCP *:6432 (LISTEN)
cupsd 7055 root 6u IPv6 19345 0t0 TCP localhost:631 (LISTEN)
cupsd 7055 root 7u IPv4 19346 0t0 TCP localhost:631 (LISTEN)
snmpd 7217 root 8u IPv4 20024 0t0 TCP localhost:199 (LISTEN)
sshd 7229 root 3u IPv4 20061 0t0 TCP *:22 (LISTEN)
sshd 7229 root 4u IPv6 20063 0t0 TCP *:22 (LISTEN)
xinetd 7237 root 5u IPv4 20130 0t0 TCP patchdb05.prod.lan:9204 (LISTEN)
xinetd 7237 root 8u IPv4 20131 0t0 TCP patchdb05.prod.lan:9200 (LISTEN)
searchd 8916 root 7u IPv4 22983 0t0 TCP *:3312 (LISTEN)

[root@patchdb05 ~] $ ps aux | grep mysql
root 27182 0.0 0.0 103252 856 pts/1 S+ 22:07 0:00 grep mysql[/I][/INDENT]

Nothing useful as of yet!

However, a plain netstat showed (amongst everything else): [INDENT]
[root@patchdb05 ~] $ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State

tcp 0 0 patchdb05.prod:appman-server patchdbload01.prod.l:12329 TIME_WAIT
tcp 0 0 patchdb05.prod.lan:mysql patchdb05.prod.lan:mysql ESTABLISHED
tcp 0 0 patchdb05.prod.:jboss-iiop patchdb05.prod.lan:gearman TIME_WAIT
tcp 0 0 patchdb05.prod.lan:ssh patchext01.prod.lan:13740 ESTABLISHED

[/INDENT]
annnnd here it is:
[INDENT]
[root@patchdb05 ~] $ lsof -l -P | grep 3306
search 9557 495 40u IPv4 38329795 0t0 TCP patchdb05.prod.lan:3288->patchdb05.prof.lan:3306 (ESTABLISHED)
[/INDENT]

So, I’m guessing, my search process (an indexer client that runs as part of sphinx search) must have frozen during mysqld restart and didn’t kill itself or time out. Killing the sphinx process and waiting a short period for the connection to timeout and Percona started again straight away.

Hope this helps someone in future! I’ll definitely remember it as it wasn’t an obvious issue this time around!

Thanks for sharing this, It will be helpful for others too. :slight_smile: