simple update sometimes stuck for about 180s when stress test

Hi guys, I have a serious issue and need some help. Any advice is appreciated.

CentOS release 6.4 (Final) Linux version 2.6.32-358.18.1.el6.centos.plus.x86_64
Percona version: 5.5.33-rel31.1
Server: 32 cores 64GB RAM

Currently I’m doing stress test for 8 percona servers, while there’s 4 proxy servers transmit the request to mysql (just like mysql-proxy)
[COLOR=#FF0000]There are four mm group for 8 mysql. Each two is a mm.

What the test do is just imitating sending messages and update the last receive time in mysql. So for mysql there’s only one update sql
like this:
[COLOR=#FF0000]update private_message_status set last_read_message_cursor=1384915973595 where user_id=1352394 and from_user_id=1352394;

Here’s the schema
CREATE TABLE private_message_status (
user_id bigint(20) NOT NULL,
from_user_id bigint(20) NOT NULL,
last_read_message_cursor bigint(20) DEFAULT ‘0’,
enable_notification tinyint(1) DEFAULT ‘1’,
PRIMARY KEY (user_id,from_user_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

And for each server, there’s about 460 updates /s

There’s occasionally slow sql which takes time more than [COLOR=#FF0000]0.1s, sometimes a few hours, but sometimes a few hours, there’s some slow sql.
But sometimes there’s a big problem that it can take [COLOR=#FF0000]2-10s, or[COLOR=#FF0000] 20-30s, occasionally around [COLOR=#FF0000]180s. Every server has this problem. [HR][/HR].
.
.
update private_message_status set last_read_message_cursor=1385047291697 where user_id=1344051 and from_user_id=1344051;

User@Host: cmmw[cmmw] @ [172.21.27.81]

Thread_id: 159395 Schema: shanliao5 Last_errno: 0 Killed: 0

Query_time: 180.108397 Lock_time: 0.000076 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1

Bytes_sent: 52

SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047292266 where user_id=1350252 and from_user_id=1350252;

User@Host: cmmw[cmmw] @ [172.21.27.81]

Thread_id: 159459 Schema: shanliao5 Last_errno: 0 Killed: 0

Query_time: 179.371043 Lock_time: 0.025446 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1

Bytes_sent: 52

SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047293027 where user_id=1351216 and from_user_id=1351216;

User@Host: cmmw[cmmw] @ [172.21.27.81]

Thread_id: 159258 Schema: shanliao5 Last_errno: 0 Killed: 0

Query_time: 180.901386 Lock_time: 0.000094 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1

Bytes_sent: 52

SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047291488 where user_id=1338937 and from_user_id=1338937;

User@Host: cmmw[cmmw] @ [172.21.28.131]

Thread_id: 159415 Schema: shanliao5 Last_errno: 0 Killed: 0

Query_time: 179.896060 Lock_time: 0.000087 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1

Bytes_sent: 52

SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047292453 where user_id=1351169 and from_user_id=1351169;

User@Host: cmmw[cmmw] @ [172.21.27.81]

Thread_id: 159291 Schema: shanliao5 Last_errno: 0 Killed: 0

Query_time: 180.739492 Lock_time: 0.000048 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1

Bytes_sent: 52

SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047291823 where user_id=1346144 and from_user_id=1346144;

User@Host: cmmw[cmmw] @ [172.21.28.131]

Thread_id: 159331 Schema: shanliao5 Last_errno: 0 Killed: 0

Query_time: 180.486669 Lock_time: 0.000048 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1

Bytes_sent: 52

SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047291902 where user_id=1336834 and from_user_id=1336834;
.
.
. [HR][/HR]

Then I did a crontab to collect the status, for every 5 seconds(I know in nomal situation, it should have a inteveral like 30s, but here 5 can see the problem clearly)
when the problem comes, the ‘show processlist’ and ‘show engine innodb status’
shows abnormal status

There’s about 600 threads in show processlist, most of which hang in query end status;

Query 4 query end update private_message_status set last_read_message_cursor=1384915973062 where user_id=1347239 and from_user_id=1347239 0 1 1
Query 4 query end update private_message_status set last_read_message_cursor=1384915973035 where user_id=1340131 and from_user_id=1340131 0 1 1
Query 4 query end update private_message_status set last_read_message_cursor=1384915973064 where user_id=1339030 and from_user_id=1339030 0 1 1
Query 4 query end update private_message_status set last_read_message_cursor=1384915973009 where user_id=1346218 and from_user_id=1346218 0 1 1
Query 4 query end update private_message_status set last_read_message_cursor=1384915973032 where user_id=1347227 and from_user_id=1347227 0 1 1
.
.
.
Query 1 Updating update private_message_status set last_read_message_cursor=1384915976882 where user_id=1336038 and from_user_id=1336038 0 0 0
Query 1 Updating update private_message_status set last_read_message_cursor=1384915976865 where user_id=1350313 and from_user_id=1350313 0 0 0
Query 1 query end update private_message_status set last_read_message_cursor=1384915977011 where user_id=1339042 and from_user_id=1339042 0 1 1
Query 0 Updating update private_message_status set last_read_message_cursor=1384915977123 where user_id=1340127 and from_user_id=1340127 0 0 0
Query 0 query end update private_message_status set last_read_message_cursor=1384915977147 where user_id=1349375 and from_user_id=1349375 0 1 1

[COLOR=#FF0000]Threads: 583 Questions: 45797267 Slow queries: 46 Opens: 701 Flush tables: 1 Open tables: 579 Queries per second avg: 745.034

The OS status
top - 10:52:57 up 43 days, 27 min, 1 user, load average: 0.40, 0.13, 0.03
Tasks: 586 total, 1 running, 585 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65937652k total, 28454740k used, 37482912k free, 225704k buffers
Swap: 8388600k total, 0k used, 8388600k free, 14169812k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
467 root 39 19 0 0 0 S 1.9 0.0 666:30.75 kipmi0
[COLOR=#FF0000]2026 mysql 20 0 57.3g 11g 5796 S 1.9 18.7 138:50.91 mysqld
1 root 20 0 19228 1504 1216 S 0.0 0.0 0:02.51 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd

while in normal status
top - 10:52:52 up 43 days, 27 min, 1 user, load average: 0.09, 0.06, 0.01
Tasks: 586 total, 1 running, 585 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65937652k total, 28394284k used, 37543368k free, 225700k buffers
Swap: 8388600k total, 0k used, 8388600k free, 14177696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
[COLOR=#FF0000]2026 mysql 20 0 44.3g 11g 5796 S 11.7 18.7 138:50.26 mysqld
12552 root 20 0 15424 1504 828 R 3.9 0.0 0:00.02 top
467 root 39 19 0 0 0 S 1.9 0.0 666:30.68 kipmi0
1 root 20 0 19228 1504 1216 S 0.0 0.0 0:02.51 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd

And the[COLOR=#FF0000] ‘show engine innodb status’
BACKGROUND THREAD

srv_master_thread loops: 61339 1_second, 61339 sleeps, 6133 10_second, 4 background, 4 flush
srv_master_thread log flush and writes: 61537

SEMAPHORES

OS WAIT ARRAY INFO: reservation count 22619, signal count 66413
Mutex spin waits 3527105, rounds 19531558, OS waits 11825
RW-shared spins 29266, rounds 272872, OS waits 6578
RW-excl spins 73082, rounds 383646, OS waits 2684
Spin rounds per wait: 5.54 mutex, 9.32 RW-shared, 5.25 RW-excl

FILE I/O

I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o’s: 0, sync i/o’s: 0
[COLOR=#FF0000]Pending flushes (fsync) log: 1; buffer pool: 0
1180 OS file reads, 43169133 OS file writes, 88320 OS fsyncs
0.00 reads/s, 0 avg bytes/read, [COLOR=#FF0000]133.37 writes/s, 0.00 fsyncs/s

INSERT BUFFER AND ADAPTIVE HASH INDEX

Ibuf: size 1, free list len 0, seg size 2, 4 merges
merged operations:
insert 3, delete mark 4, delete 3
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 80060237, node heap has 10 buffer(s)
205.76 hash searches/s, 1.20 non-hash searches/s

LOG

Log sequence number 292140682848
Log flushed up to 292140021164
Last checkpoint at 292101169700
Max checkpoint age 166798910
Checkpoint age target 161586445
Modified age 39513148
Checkpoint age 39513148
[COLOR=#FF0000]1 pending log writes, 0 pending chkp writes
42850100 log i/o’s done, [COLOR=#FF0000]133.37 log i/o’s/second

BUFFER POOL AND MEMORY

Total memory allocated 41540485120; in additional pool allocated 0
Total memory allocated by read views 4216
Internal hash tables (constant factor + variable factor)
Adaptive hash index 640649952 (640481896 + 168056)
Page hash 40031048 (buffer pool 0 only)
Dictionary cache 160578044 (160122448 + 455596)
File system 115504 (82672 + 32832)
Lock system 100298272 (100076056 + 222216)
Recovery system 0 (0 + 0)
Dictionary memory allocated 455596
Buffer pool size 2469119
Buffer pool size, bytes 40454045696
Free buffers 2467551
Database pages 1558
Old database pages 555
Modified db pages 720
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 1, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 1115, created 443, written 312879
0.00 reads/s, 43.79 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 1558, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]

ROW OPERATIONS

0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
587 transactions active inside InnoDB
587 out of 1000 descriptors used
—OLDEST VIEW—
Normal read view
Read view low limit trx n:o 487B8735
Read view up limit trx id 487B86C1
Read view low limit trx id 487B8735
Read view individually stored trx ids:
Read view trx id 487B8734
Read view trx id 487B8733
Read view trx id 487B8732
Read view trx id 487B8731
Read view trx id 487B8730
Read view trx id 487B872F
Read view trx id 487B8727
.
.
.
.
.
.
Read view trx id 487B86C3
Read view trx id 487B86C2
Read view trx id 487B86C1

Main thread process no. 2026, id 139971502806784, state: flushing log
Number of rows inserted 42, updated 29529289, deleted 6, read 29550470
0.00 inserts/s, 193.96 updates/s, 0.00 deletes/s, 193.96 reads/s

TRANSACTIONS

Trx id counter 487B890D
Purge done for trx’s n:o < 487B86C1 undo n:o < 0
History list length 2409
LIST OF TRANSACTIONS FOR EACH SESSION:
—TRANSACTION 0, not started
MySQL thread id 48637, OS thread handle 0x7f499ba69700, query id 45797309 localhost root
SHOW ENGINE INNODB STATUS
—TRANSACTION 487B7D27, not started
MySQL thread id 5459, OS thread handle 0x7f4da632f700, query id 45792517 172.21.28.131 cmmw
—TRANSACTION 487B77B9, not started
MySQL thread id 5415, OS thread handle 0x7f57a4227700, query id 45791443 172.21.27.81 cmmw
—TRANSACTION 487B86B7, not started
MySQL thread id 2, OS thread handle 0x7f4da66bd700, query id 45794398 Slave has read all relay log; waiting for the slave I/O thread to update it
—TRANSACTION 487B890C, ACTIVE (PREPARED) 0 sec preparing
mysql tables in use 1, locked 1
2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 48636, OS thread handle 0x7f499baaa700, query id 45797307 172.21.28.130 cmmw query end
update private_message_status set last_read_message_cursor=1384915977654 where user_id=1346271 and from_user_id=1346271
TABLE LOCK table shanliao6.private_message_status trx id 487B890C lock mode IX
RECORD LOCKS space id 280 page no 22 n bits 304 index PRIMARY of table shanliao6.private_message_status trx id 487B890C lock_mode X locks rec but not gap
—TRANSACTION 487B890B, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 48635, OS thread handle 0x7f499baeb700, query id 45797302 172.21.28.130 cmmw Updating
update private_message_status set last_read_message_cursor=1384915977649 where user_id=1356481 and from_user_id=1356481
------- TRX HAS BEEN WAITING 0 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 280 page no 19 n bits 360 index PRIMARY of table shanliao6.private_message_status trx id 487B890B lock_mode X locks rec but not gap waiting

TABLE LOCK table shanliao6.private_message_status trx id 487B890B lock mode IX
RECORD LOCKS space id 280 page no 19 n bits 360 index PRIMARY of table shanliao6.private_message_status trx id 487B890B lock_mode X locks rec but not gap waiting
—TRANSACTION 487B890A, ACTIVE (PREPARED) 0 sec preparing
mysql tables in use 1, locked 1
2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 48634, OS thread handle 0x7f499bb2c700, query id 45797297 172.21.28.130 cmmw query end
update private_message_status set last_read_message_cursor=1384915977638 where user_id=1354394 and from_user_id=1354394
TABLE LOCK table shanliao6.private_message_status trx id 487B890A lock mode IX
RECORD LOCKS space id 280 page no 14 n bits 376 index PRIMARY of table shanliao6.private_message_status trx id 487B890A lock_mode X locks rec but not gap
—TRANSACTION 487B8909, ACTIVE (PREPARED) 0 sec preparing
mysql tables in use 1, locked 1
2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 48633, OS thread handle 0x7f499bb6d700, query id 45797292 172.21.28.131 cmmw query end
update private_message_status set last_read_message_cursor=1384915977449 where user_id=1339051 and from_user_id=1339051
TABLE LOCK table shanliao6.private_message_status trx id 487B8909 lock mode IX
RECORD LOCKS space id 280 page no 25 n bits 344 index PRIMARY of table shanliao6.private_message_status trx id 487B8909 lock_mode X locks rec but not gap
—TRANSACTION 487B8908, ACTIVE (PREPARED) 0 sec preparing
mysql tables in use 1, locked 1
2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 48632, OS thread handle 0x7f499bbae700, query id 45797287 172.21.28.131 cmmw query end
update private_message_status set last_read_message_cursor=1384915977325 where user_id=1352440 and from_user_id=1352440
TABLE LOCK table shanliao6.private_message_status trx id 487B8908 lock mode IX
RECORD LOCKS space id 280 page no 14 n bits 376 index PRIMARY of table shanliao6.private_message_status trx id 487B8908 lock_mode X locks rec but not gap

.
.
.
.
.
.
—TRANSACTION 487B86C2, ACTIVE (PREPARED) 4 sec preparing
mysql tables in use 1, locked 1
2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 45434, OS thread handle 0x7f4da61ea700, query id 45794403 172.21.28.130 cmmw query end
update private_message_status set last_read_message_cursor=1384915973009 where user_id=1346218 and from_user_id=1346218
TABLE LOCK table shanliao6.private_message_status trx id 487B86C2 lock mode IX
RECORD LOCKS space id 280 page no 22 n bits 304 index PRIMARY of table shanliao6.private_message_status trx id 487B86C2 lock_mode X locks rec but not gap
—TRANSACTION 487B86C1, ACTIVE (PREPARED) 4 sec preparing
mysql tables in use 1, locked 1
2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 45748, OS thread handle 0x7f57a4268700, query id 45794402 172.21.27.81 cmmw query end
update private_message_status set last_read_message_cursor=1384915973032 where user_id=1347227 and from_user_id=1347227
TABLE LOCK table shanliao6.private_message_status trx id 487B86C1 lock mode IX
RECORD LOCKS space id 280 page no 22 n bits 304 index PRIMARY of table shanliao6.private_message_status trx id 487B86C1 lock_mode X locks rec but not gap

END OF INNODB MONITOR OUTPUT

mysql my.cnf
[mysqld]
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
back_log = 2000

key_buffer = 256M
max_allowed_packet = 2M
table_cache = 512
join_buffer_size = 64M
sort_buffer_size = 32M
read_buffer_size = 32M
read_rnd_buffer_size = 64M
myisam_sort_buffer_size = 64M
query_cache_size = 64M
query_cache_limit = 8M
max_tmp_tables = 128
tmp_table_size = 256M
thread_cache_size = 256
thread_concurrency = 4
wait_timeout = [COLOR=#0000FF]3600
interactive_timeout = [COLOR=#0000FF]3600
myisam_repair_threads = 1
myisam-recover = DEFAULT

slow_query_log = 1
long_query_time = 0.1

low_priority_updates
delay_key_write = ALL

max_connections = 1024
max_connect_errors = 99999999

skip-name-resolve
relay-log=relay-bin

default_storage_engine = InnoDB

innodb_file_per_table
innodb_file_format = “Barracuda”
innodb_data_home_dir = /data/mysqldata/
innodb_data_file_path = ibdata1:100M:autoextend
innodb_log_group_home_dir = /data/mysqldata/
innodb_buffer_pool_size = 38580M
innodb_additional_mem_pool_size = 100M
innodb_log_file_size = [COLOR=#0000FF]1024M
innodb_log_buffer_size = [COLOR=#0000FF]128M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 50
innodb_thread_concurrency = 40
innodb_flush_method = O_DIRECT
#innodb_force_recovery = 6
#transaction-isolation = READ-COMMITTED
innodb_io_capacity = 500
auto_increment_offset=2
auto_increment_increment=2

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash

[isamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 8M
write_buffer = 8M

I know there’s one setting may have some problem
[COLOR=#FF0000]innodb_log_file_size = 100M
According to this http://www.mysqlperformanceblog.com/…log-file-size/
I cauculated that our server should be [COLOR=#FF0000]innodb_log_file_size = 1024M,
I already choose one server to apply this setting and wait to see what will happen
[COLOR=#0000FF]Ps: This doesn’t fix the problem at all, the server still has this problem.

This issue has scratched my head for a few weeks, hope someone can give me some idea, thks.

I forgot an important thing, table ‘[COLOR=#252C2F]private_message_status’ only has around 2500 records for every server.