Hi,
I’m trying to use PMM started from AMI, but keep encounter issues with that.
we’ve started with m4.2xlarge instance. Added 50 mysql instances ,linux:metrics mysql:metrics mysql:queries (via slow log ), and 4 ProxySQL instances for monitoring. note: added LimitsNOFILE=65536 to prometheus service to get rid of “Too many open files” error.
The issue is: on the second day I’ve noticed the PMM WEB UI became unresponsive. it apperas the LA is too high:
#uptime
14:01:33 up 1 day, 5:23, 1 user, load average: 24.00, 23.94, 23.39
however atop shows cpu’s are in idle, see screenshot attached,
dmesg shows me
[105465.363093] XFS (dm-4): metadata I/O error: block 0x23776f0 (“xfs_buf_iodone_callback_error”) error 5 numblks 8
[105466.633074] XFS: Failing async write: 2984 callbacks suppressed
[105466.635627] XFS (dm-4): Failing async write on buffer block 0x23776f0. Retrying async write.
I see the disks are not full:
df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda1 xfs 128G 2.7G 126G 3% /
devtmpfs devtmpfs 16G 0 16G 0% /dev
tmpfs tmpfs 16G 0 16G 0% /dev/shm
tmpfs tmpfs 16G 649M 15G 5% /run
tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/mapper/DataVG-DataLV xfs 205G 28G 178G 14% /srv
tmpfs tmpfs 3.2G 0 3.2G 0% /run/user/0
tmpfs tmpfs 3.2G 0 3.2G 0% /run/user/1001
df -h -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda1 128M 55K 128M 1% /
devtmpfs 4.0M 338 4.0M 1% /dev
tmpfs 4.0M 1 4.0M 1% /dev/shm
tmpfs 4.0M 397 4.0M 1% /run
tmpfs 4.0M 16 4.0M 1% /sys/fs/cgroup
/dev/mapper/DataVG-DataLV 205M 525K 205M 1% /srv
tmpfs 4.0M 1 4.0M 1% /run/user/0
I’ve tried to reboot the server, but it got stuck. My admins rebooted it via AWS Console, but after reboot LVM volume with /srv had disappeared:
df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda1 xfs 128G 2.9G 126G 3% /
devtmpfs devtmpfs 16G 0 16G 0% /dev
tmpfs tmpfs 16G 0 16G 0% /dev/shm
tmpfs tmpfs 16G 17M 16G 1% /run
tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup
tmpfs tmpfs 3.2G 0 3.2G 0% /run/user/1001