Hi guys, I have a serious issue and need some help. Any advice is appreciated.
CentOS release 6.4 (Final) Linux version 2.6.32-358.18.1.el6.centos.plus.x86_64
Percona version: 5.5.33-rel31.1
Server: 32 cores 64GB RAM
Currently I’m doing stress test for 8 percona servers, while there’s 4 proxy servers transmit the request to mysql (just like mysql-proxy)
[COLOR=#FF0000]There are four mm group for 8 mysql. Each two is a mm.
What the test do is just imitating sending messages and update the last receive time in mysql. So for mysql there’s only one update sql
like this:
[COLOR=#FF0000]update private_message_status set last_read_message_cursor=1384915973595 where user_id=1352394 and from_user_id=1352394;
Here’s the schema
CREATE TABLE private_message_status
(
user_id
bigint(20) NOT NULL,
from_user_id
bigint(20) NOT NULL,
last_read_message_cursor
bigint(20) DEFAULT ‘0’,
enable_notification
tinyint(1) DEFAULT ‘1’,
PRIMARY KEY (user_id
,from_user_id
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
And for each server, there’s about 460 updates /s
There’s occasionally slow sql which takes time more than [COLOR=#FF0000]0.1s, sometimes a few hours, but sometimes a few hours, there’s some slow sql.
But sometimes there’s a big problem that it can take [COLOR=#FF0000]2-10s, or[COLOR=#FF0000] 20-30s, occasionally around [COLOR=#FF0000]180s. Every server has this problem. [HR][/HR].
.
.
update private_message_status set last_read_message_cursor=1385047291697 where user_id=1344051 and from_user_id=1344051;
User@Host: cmmw[cmmw] @ [172.21.27.81]
Thread_id: 159395 Schema: shanliao5 Last_errno: 0 Killed: 0
Query_time: 180.108397 Lock_time: 0.000076 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
Bytes_sent: 52
SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047292266 where user_id=1350252 and from_user_id=1350252;
User@Host: cmmw[cmmw] @ [172.21.27.81]
Thread_id: 159459 Schema: shanliao5 Last_errno: 0 Killed: 0
Query_time: 179.371043 Lock_time: 0.025446 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
Bytes_sent: 52
SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047293027 where user_id=1351216 and from_user_id=1351216;
User@Host: cmmw[cmmw] @ [172.21.27.81]
Thread_id: 159258 Schema: shanliao5 Last_errno: 0 Killed: 0
Query_time: 180.901386 Lock_time: 0.000094 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
Bytes_sent: 52
SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047291488 where user_id=1338937 and from_user_id=1338937;
User@Host: cmmw[cmmw] @ [172.21.28.131]
Thread_id: 159415 Schema: shanliao5 Last_errno: 0 Killed: 0
Query_time: 179.896060 Lock_time: 0.000087 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
Bytes_sent: 52
SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047292453 where user_id=1351169 and from_user_id=1351169;
User@Host: cmmw[cmmw] @ [172.21.27.81]
Thread_id: 159291 Schema: shanliao5 Last_errno: 0 Killed: 0
Query_time: 180.739492 Lock_time: 0.000048 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
Bytes_sent: 52
SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047291823 where user_id=1346144 and from_user_id=1346144;
User@Host: cmmw[cmmw] @ [172.21.28.131]
Thread_id: 159331 Schema: shanliao5 Last_errno: 0 Killed: 0
Query_time: 180.486669 Lock_time: 0.000048 Rows_sent: 0 Rows_examined: 1 Rows_affected: 1 Rows_read: 1
Bytes_sent: 52
SET timestamp=1385047472;
update private_message_status set last_read_message_cursor=1385047291902 where user_id=1336834 and from_user_id=1336834;
.
.
. [HR][/HR]
Then I did a crontab to collect the status, for every 5 seconds(I know in nomal situation, it should have a inteveral like 30s, but here 5 can see the problem clearly)
when the problem comes, the ‘show processlist’ and ‘show engine innodb status’
shows abnormal status
There’s about 600 threads in show processlist, most of which hang in query end status;
Query 4 query end update private_message_status set last_read_message_cursor=1384915973062 where user_id=1347239 and from_user_id=1347239 0 1 1
Query 4 query end update private_message_status set last_read_message_cursor=1384915973035 where user_id=1340131 and from_user_id=1340131 0 1 1
Query 4 query end update private_message_status set last_read_message_cursor=1384915973064 where user_id=1339030 and from_user_id=1339030 0 1 1
Query 4 query end update private_message_status set last_read_message_cursor=1384915973009 where user_id=1346218 and from_user_id=1346218 0 1 1
Query 4 query end update private_message_status set last_read_message_cursor=1384915973032 where user_id=1347227 and from_user_id=1347227 0 1 1
.
.
.
Query 1 Updating update private_message_status set last_read_message_cursor=1384915976882 where user_id=1336038 and from_user_id=1336038 0 0 0
Query 1 Updating update private_message_status set last_read_message_cursor=1384915976865 where user_id=1350313 and from_user_id=1350313 0 0 0
Query 1 query end update private_message_status set last_read_message_cursor=1384915977011 where user_id=1339042 and from_user_id=1339042 0 1 1
Query 0 Updating update private_message_status set last_read_message_cursor=1384915977123 where user_id=1340127 and from_user_id=1340127 0 0 0
Query 0 query end update private_message_status set last_read_message_cursor=1384915977147 where user_id=1349375 and from_user_id=1349375 0 1 1
[COLOR=#FF0000]Threads: 583 Questions: 45797267 Slow queries: 46 Opens: 701 Flush tables: 1 Open tables: 579 Queries per second avg: 745.034
The OS status
top - 10:52:57 up 43 days, 27 min, 1 user, load average: 0.40, 0.13, 0.03
Tasks: 586 total, 1 running, 585 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65937652k total, 28454740k used, 37482912k free, 225704k buffers
Swap: 8388600k total, 0k used, 8388600k free, 14169812k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
467 root 39 19 0 0 0 S 1.9 0.0 666:30.75 kipmi0
[COLOR=#FF0000]2026 mysql 20 0 57.3g 11g 5796 S 1.9 18.7 138:50.91 mysqld
1 root 20 0 19228 1504 1216 S 0.0 0.0 0:02.51 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd
while in normal status
top - 10:52:52 up 43 days, 27 min, 1 user, load average: 0.09, 0.06, 0.01
Tasks: 586 total, 1 running, 585 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65937652k total, 28394284k used, 37543368k free, 225700k buffers
Swap: 8388600k total, 0k used, 8388600k free, 14177696k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
[COLOR=#FF0000]2026 mysql 20 0 44.3g 11g 5796 S 11.7 18.7 138:50.26 mysqld
12552 root 20 0 15424 1504 828 R 3.9 0.0 0:00.02 top
467 root 39 19 0 0 0 S 1.9 0.0 666:30.68 kipmi0
1 root 20 0 19228 1504 1216 S 0.0 0.0 0:02.51 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd