How to safely kill specific transaction

Hi,
I have a question about the way to safely kill or recover stacked nodes.

let me explain how I was in trouble and what is the question.

☆ The way I was in trouble

  1. DB is stucked during the benchmarking. (I saw lock wait timeout warning from the bench marking apllication)

processlist of Master (Node for update) is as follows

mysql> show processlist;
±—±------------±--------------------±-----±--------±------±---------------------------±----------------------±----------±--------------±----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
±—±------------±--------------------±-----±--------±------±---------------------------±----------------------±----------±--------------±----------+
| 1 | system user | | NULL | Sleep | 14292 | wsrep aborter idle | NULL | 0 | 0 | 0 |
| 2 | system user | | NULL | Sleep | 14292 | NULL | NULL | 0 | 0 | 0 |
(…)
| 32 | system user | | NULL | Sleep | 14263 | NULL | NULL | 0 | 0 | 0 |
| 33 | system user | | NULL | Sleep | 14263 | NULL | NULL | 0 | 0 | 0 |
| 37 | tpcc | 192.168.10.45:41469 | tpcc | Killed | 12945 | wsrep in pre-commit stage | commit | 0 | 0 | 0 |
( many thread of pre-commit stage threads )
| 42 | tpcc | 192.168.10.45:41474 | tpcc | Query | 12945 | wsrep in pre-commit stage | commit | 0 | 0 | 0 |
(…)
| 69 | root | localhost | NULL | Query | 7669 | Preparing for TO isolation | create database test1 | 0 | 0 | 0 |
| 72 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 0 |
±—±------------±--------------------±-----±--------±------±---------------------------±----------------------±----------±--------------±----------+
55 rows in set (0.00 sec)

Id 69 is query so that I confirm whether I can update DB. ( this query seems to be stucked )

  1. So I checked transaction state by referencing select * from information_schema.innodb_trx;

** transaction of master node is empty (this is OK)

mysql> select * from information_schema.innodb_trx;
Empty set (0.00 sec)

** transaction of slave nodes is not empty and seems to be stucked. (this also is OK).

*************************** 29. row ***************************
trx_id: 1B01
trx_state: RUNNING
trx_started: 2014-07-17 14:19:53
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 11
trx_mysql_thread_id: 6
trx_query: NULL
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 0
trx_lock_structs: 7
trx_lock_memory_bytes: 1248
trx_rows_locked: 3
trx_rows_modified: 4
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 10000
trx_wsrep_seqno: 5594362
trx_query_state: idle
trx_conflict_state: no conflict
trx_exec_mode: applier
trx_consistency_check: None
*************************** 30. row ***************************
trx_id: 1B00
trx_state: RUNNING
trx_started: 2014-07-17 14:19:53
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 11
trx_mysql_thread_id: 4
trx_query: NULL ☆ NULL !
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 0
trx_lock_structs: 7
trx_lock_memory_bytes: 1248
trx_rows_locked: 3
trx_rows_modified: 4
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 10000
trx_wsrep_seqno: 5594361
trx_query_state: idle
trx_conflict_state: no conflict
trx_exec_mode: applier
trx_consistency_check: None
30 rows in set (0.01 sec)

mysql> system date;
Thu Jul 17 18:21:36 JST 2014

** process list of slave node is as follows (slave side have the position of write set)
mysql> show processlist;
±—±------------±----------±-----±--------±------±--------------------------±-----------------±----------±--------------±----------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read |
±—±------------±----------±-----±--------±------±--------------------------±-----------------±----------±--------------±----------+
| 1 | system user | | NULL | Sleep | 14461 | applied write set 5594368 | NULL | 0 | 0 | 0 |
| 2 | system user | | NULL | Sleep | 14871 | wsrep aborter idle | NULL | 0 | 0 | 0 |
| 3 | system user | | NULL | Sleep | 14461 | applied write set 5594365 | NULL | 0 | 0 | 0 |
(many threads)
| 33 | system user | | NULL | Sleep | 14461 | applied write set 5594392 | NULL | 0 | 0 | 0 |
| 36 | root | localhost | NULL | Query | 0 | sleeping | show processlist | 0 | 0 | 0 |
±—±------------±----------±-----±--------±------±--------------------------±-----------------±----------±--------------±----------+
34 rows in set (0.01 sec)

☆ Question
Is there any recommended way to manage without restarting the node, when some transaction seem to be stucked?

I confirmed

  • I can’t kill wsrep_slave_threads
  • I can kill original master side thread to stuck
    (Even If we can fix transaction status among the cluster by this operation, there is the problen that I don’t have any information to know which thread to kill because there seems to be no information to link the slave thread.)

I tested on repl.commit_order=2.