Hi,
I have a question about the way to safely kill or recover stacked nodes.
let me explain how I was in trouble and what is the question.
☆ The way I was in trouble
- DB is stucked during the benchmarking. (I saw lock wait timeout warning from the bench marking apllication)
processlist of Master (Node for update) is as follows
mysql> show processlist;
±—±------------±--------------------±-----±--------±------±---------------------------±----------------------±----------±--------------±----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
±—±------------±--------------------±-----±--------±------±---------------------------±----------------------±----------±--------------±----------+
| 1 | system user | | NULL | Sleep | 14292 | wsrep aborter idle | NULL | 0 | 0 | 0 |
| 2 | system user | | NULL | Sleep | 14292 | NULL | NULL | 0 | 0 | 0 |
(…)
| 32 | system user | | NULL | Sleep | 14263 | NULL | NULL | 0 | 0 | 0 |
| 33 | system user | | NULL | Sleep | 14263 | NULL | NULL | 0 | 0 | 0 |
| 37 | tpcc | 192.168.10.45:41469 | tpcc | Killed | 12945 | wsrep in pre-commit stage | commit | 0 | 0 | 0 |
( many thread of pre-commit stage threads )
| 42 | tpcc | 192.168.10.45:41474 | tpcc | Query | 12945 | wsrep in pre-commit stage | commit | 0 | 0 | 0 |
(…)
| 69 | root | localhost | NULL | Query | 7669 | Preparing for TO isolation | create database test1 | 0 | 0 | 0 |
| 72 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 0 |
±—±------------±--------------------±-----±--------±------±---------------------------±----------------------±----------±--------------±----------+
55 rows in set (0.00 sec)
Id 69 is query so that I confirm whether I can update DB. ( this query seems to be stucked )
- So I checked transaction state by referencing select * from information_schema.innodb_trx;
** transaction of master node is empty (this is OK)
mysql> select * from information_schema.innodb_trx;
Empty set (0.00 sec)
** transaction of slave nodes is not empty and seems to be stucked. (this also is OK).
*************************** 29. row ***************************
trx_id: 1B01
trx_state: RUNNING
trx_started: 2014-07-17 14:19:53
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 11
trx_mysql_thread_id: 6
trx_query: NULL
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 0
trx_lock_structs: 7
trx_lock_memory_bytes: 1248
trx_rows_locked: 3
trx_rows_modified: 4
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 10000
trx_wsrep_seqno: 5594362
trx_query_state: idle
trx_conflict_state: no conflict
trx_exec_mode: applier
trx_consistency_check: None
*************************** 30. row ***************************
trx_id: 1B00
trx_state: RUNNING
trx_started: 2014-07-17 14:19:53
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 11
trx_mysql_thread_id: 4
trx_query: NULL ☆ NULL !
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 0
trx_lock_structs: 7
trx_lock_memory_bytes: 1248
trx_rows_locked: 3
trx_rows_modified: 4
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 10000
trx_wsrep_seqno: 5594361
trx_query_state: idle
trx_conflict_state: no conflict
trx_exec_mode: applier
trx_consistency_check: None
30 rows in set (0.01 sec)
mysql> system date;
Thu Jul 17 18:21:36 JST 2014
** process list of slave node is as follows (slave side have the position of write set)
mysql> show processlist;
±—±------------±----------±-----±--------±------±--------------------------±-----------------±----------±--------------±----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
±—±------------±----------±-----±--------±------±--------------------------±-----------------±----------±--------------±----------+
| 1 | system user | | NULL | Sleep | 14461 | applied write set 5594368 | NULL | 0 | 0 | 0 |
| 2 | system user | | NULL | Sleep | 14871 | wsrep aborter idle | NULL | 0 | 0 | 0 |
| 3 | system user | | NULL | Sleep | 14461 | applied write set 5594365 | NULL | 0 | 0 | 0 |
(many threads)
| 33 | system user | | NULL | Sleep | 14461 | applied write set 5594392 | NULL | 0 | 0 | 0 |
| 36 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 0 |
±—±------------±----------±-----±--------±------±--------------------------±-----------------±----------±--------------±----------+
34 rows in set (0.01 sec)
☆ Question
Is there any recommended way to manage without restarting the node, when some transaction seem to be stucked?
I confirmed
- I can’t kill wsrep_slave_threads
- I can kill original master side thread to stuck
(Even If we can fix transaction status among the cluster by this operation, there is the problen that I don’t have any information to know which thread to kill because there seems to be no information to link the slave thread.)