Increasing load avg after upgrade while system stays responsive

Hi all,

I have 2 MySQL / DRBD setups (2 separate setups with the same symptoms) and after the upgrade to the latest 5.5 (5.5.32-rel31.0-549.lucid), I’m experiencing something strange. The affected machines run Ubuntu 10.04 LTS. Load average seems to increase over time, but never decreases any more. Still the system is very responsive. Normally things would be slow with a load of 10+ like in the example below, but it just feels like it is under normal load, which would be like around 2 to 3.

The machines are running this setup for almost 3 years now. The setup was inspected & fine-tuned by a Percona consultant.

I don’t have this issue on machines running the same Ubuntu version but without Percona server. I also don’t have this issue on machines running 12.04 with Percona server.

It looks like this only happens on Ubuntu 10.04 with the latest kernel (2.6.32-49-server) and the latest Percona (5.5.32-rel31.0-549.lucid). All my systems are x86_64.

Top snippet below. The load average is high, individual load is low. I/O is fine, memory is fine.

top - 09:27:42 up 10:57, 1 user, load average: 10.45, 10.27, 10.07
Tasks: 417 total, 1 running, 416 sleeping, 0 stopped, 0 zombie
Cpu0 : 12.3%us, 1.0%sy, 0.0%ni, 84.4%id, 0.3%wa, 0.0%hi, 1.9%si, 0.0%st
Cpu1 : 7.1%us, 2.2%sy, 0.3%ni, 88.5%id, 0.3%wa, 0.0%hi, 1.6%si, 0.0%st
Cpu2 : 0.3%us, 0.6%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 8.9%us, 0.7%sy, 0.0%ni, 89.1%id, 0.3%wa, 0.0%hi, 1.0%si, 0.0%st
Cpu4 : 13.1%us, 1.6%sy, 0.0%ni, 84.6%id, 0.3%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu5 : 15.4%us, 1.3%sy, 0.0%ni, 83.0%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 5.0%us, 0.3%sy, 0.0%ni, 94.4%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 0.0%us, 1.6%sy, 0.0%ni, 98.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 1.3%us, 0.3%sy, 0.0%ni, 98.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu12 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 12.8%us, 1.3%sy, 0.0%ni, 85.6%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu14 : 7.1%us, 1.0%sy, 0.0%ni, 90.9%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Cpu15 : 5.1%us, 1.0%sy, 0.0%ni, 93.6%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 99197468k total, 59306892k used, 39890576k free, 630384k buffers
Swap: 7811064k total, 0k used, 7811064k free, 16108860k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3907 mysql 20 0 80.8g 37g 7748 S 99 39.4 222:48.39 mysqld
1489 zabbix 25 5 64464 2168 1652 S 1 0.0 8:47.42 zabbix_agentd
1749 root 20 0 19500 1700 1072 R 1 0.0 0:00.27 top

Any ideas? I’m puzzled…

Thanks,

Dirk

I’ve solved this one… Both the systems had a NFS share. I had a Ununtu 12.04 client using the share, it was recently updated from 10.04. Mounting the share as nfs3 instead of nfs4 made the problems go away. Very strange, since al the NFS stuff on the server looked good & wasn’t complaining…

Increased load not always means the system is overloaded, in fact it can be pretty idle with very high load. NFS stall is the typical case, where each new request to hanged NFS share will increase the load by 1.