Hi All,
I’m running a master/slave setup using Percona 5.7.17-11-1 with GTID + semi-sync replication enabled. More or less once a week the thread pool for the replication user on the master, is overloaded by Binlog Dump GTID threads in Waiting to finalize termination or starting state. Unfortunately I still don’t have a repro but this is what I see when the issue happens:
- semi-sync repl fails for a replica & semi-sync binlog_dump is stopped on master
2017-06-19T10:12:58.501501Z 864 [ERROR] Semi-sync master failed on net_flush() before waiting for slave reply
2017-06-19T10:12:58.501533Z 864 [Note] Stop semi-sync binlog_dump to slave (server_id: 1019420603)
- slave reconnect to the master starting a new binlog_dump thread and while initializing a new dump thread for the slave, the master find a zombie dump thread with the same UUID
2017-06-19T10:14:48.993726Z 84362 [Note] While initializing dump thread for slave with UUID <787d319f-33db-11e7-9c43-122ca6236436>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(864).
2017-06-19T10:14:48.993888Z 84362 [Note] Start binlog_dump to master_thread_id(84362) slave_server(1019420603), pos(, 4)
- Binlog Dump GTID thread is hanging forever keeping a connection
Here’s the full processlist after several iterations of the above
Id User Host db Command Time State Info Rows_sent Rows_examined
23738 vt_repl_a 10.3.55.166:43566 NULL Binlog Dump GTID 613992 Master has sent all binlog to slave; waiting for more updates NULL 0 0
84394 vt_repl_a 10.3.197.55:59470 NULL Binlog Dump GTID 17998 Waiting to finalize termination NULL 0 0
84441 vt_repl_a 10.3.197.55:60020 NULL Binlog Dump GTID 17718 Waiting to finalize termination NULL 0 0
84521 vt_repl_a 10.3.197.55:60938 NULL Binlog Dump GTID 17228 Waiting to finalize termination NULL 0 0
84592 vt_repl_a 10.3.197.55:61730 NULL Binlog Dump GTID 16807 Waiting to finalize termination NULL 0 0
84630 vt_repl_a 10.3.197.55:62160 NULL Binlog Dump GTID 16597 Waiting to finalize termination NULL 0 0
84641 vt_repl_a 10.3.197.55:62282 NULL Binlog Dump GTID 16527 Waiting to finalize termination NULL 0 0
84651 vt_repl_a 10.3.197.55:62406 NULL Binlog Dump GTID 16457 Waiting to finalize termination NULL 0 0
84663 vt_repl_a 10.3.197.55:62526 NULL Binlog Dump GTID 16387 Waiting to finalize termination NULL 0 0
84675 vt_repl_a 10.3.197.55:62654 NULL Binlog Dump GTID 16317 Waiting to finalize termination NULL 0 0
84685 vt_repl_a 10.3.197.55:62774 NULL Binlog Dump GTID 16247 Waiting to finalize termination NULL 0 0
84747 vt_repl_a 10.3.197.55:63450 NULL Binlog Dump GTID 15896 Waiting to finalize termination NULL 0 0
84784 vt_repl_a 10.3.197.55:63878 NULL Binlog Dump GTID 15686 Waiting to finalize termination NULL 0 0
84895 vt_repl_a 10.3.197.55:65166 NULL Binlog Dump GTID 14985 Waiting to finalize termination NULL 0 0
84911 vt_repl_a 10.3.197.55:65346 NULL Binlog Dump GTID 14915 Waiting to finalize termination NULL 0 0
85923 vt_repl_a 10.3.197.55:20114 NULL Binlog Dump GTID 10883 Waiting to finalize termination NULL 0 0
85934 vt_repl_a 10.3.197.55:20236 NULL Binlog Dump GTID 10813 Waiting to finalize termination NULL 0 0
86220 vt_repl_a 10.3.197.55:22682 NULL Binlog Dump GTID 9622 Waiting to finalize termination NULL 0 0
86231 vt_repl_a 10.3.197.55:22800 NULL Binlog Dump GTID 9552 Waiting to finalize termination NULL 0 0
87098 vt_repl_a 10.3.197.55:29746 NULL Binlog Dump GTID 6181 starting NULL 0 0
87109 vt_repl_a 10.3.197.55:29868 NULL Binlog Dump GTID 6111 starting NULL 0 0
87125 vt_repl_a 10.3.197.55:30046 NULL Binlog Dump GTID 6040 starting NULL 0 0
87135 vt_repl_a 10.3.197.55:30164 NULL Binlog Dump GTID 5970 starting NULL 0 0
87146 vt_repl_a 10.3.197.55:30288 NULL Binlog Dump GTID 5900 starting NULL 0 0
87158 vt_repl_a 10.3.197.55:30416 NULL Binlog Dump GTID 5830 starting NULL 0 0
87168 vt_repl_a 10.3.197.55:30538 NULL Binlog Dump GTID 5760 starting NULL 0 0
87179 vt_repl_a 10.3.197.55:30660 NULL Binlog Dump GTID 5690 starting NULL 0 0
87195 vt_repl_a 10.3.197.55:30838 NULL Binlog Dump GTID 5620 starting NULL 0 0
87205 vt_repl_a 10.3.197.55:30968 NULL Binlog Dump GTID 5550 starting NULL 0 0
87216 vt_repl_a 10.3.197.55:31088 NULL Binlog Dump GTID 5480 starting NULL 0 0
87227 vt_repl_a 10.3.197.55:31212 NULL Binlog Dump GTID 5410 starting NULL 0 0
87237 vt_repl_a 10.3.197.55:31332 NULL Binlog Dump GTID 5340 starting NULL 0 0
87248 vt_repl_a 10.3.197.55:31460 NULL Binlog Dump GTID 5270 starting NULL 0 0
87264 vt_repl_a 10.3.197.55:31640 NULL Binlog Dump GTID 5200 starting NULL 0 0
87274 vt_repl_a 10.3.197.55:31762 NULL Binlog Dump GTID 5130 Waiting to finalize termination NULL 0 0
87285 vt_repl_a 10.3.197.55:31884 NULL Binlog Dump GTID 5060 starting NULL 0 0
87296 vt_repl_a 10.3.197.55:32006 NULL Binlog Dump GTID 4990 starting NULL 0 0
87306 vt_repl_a 10.3.197.55:32134 NULL Binlog Dump GTID 4919 starting NULL 0 0
87317 vt_repl_a 10.3.197.55:32254 NULL Binlog Dump GTID 4849 starting NULL 0 0
87333 vt_repl_a 10.3.197.55:32434 NULL Binlog Dump GTID 4779 starting NULL 0 0
87343 vt_repl_a 10.3.197.55:32558 NULL Binlog Dump GTID 4709 Waiting to finalize termination NULL 0 0
87354 vt_repl_a 10.3.197.55:32682 NULL Binlog Dump GTID 4639 starting NULL 0 0
87365 vt_repl_a 10.3.197.55:32806 NULL Binlog Dump GTID 4569 starting NULL 0 0
87375 vt_repl_a 10.3.197.55:32932 NULL Binlog Dump GTID 4499 starting NULL 0 0
87386 vt_repl_a 10.3.197.55:33056 NULL Binlog Dump GTID 4429 starting NULL 0 0
87402 vt_repl_a 10.3.197.55:33240 NULL Binlog Dump GTID 4359 starting NULL 0 0
87413 vt_repl_a 10.3.197.55:33362 NULL Binlog Dump GTID 4289 starting NULL 0 0
87423 vt_repl_a 10.3.197.55:33484 NULL Binlog Dump GTID 4219 starting NULL 0 0
87434 vt_repl_a 10.3.197.55:33606 NULL Binlog Dump GTID 4149 starting NULL 0 0