Not the answer you need?
Register and ask your own question!

Issue with /srv volume

DmytroKhDmytroKh ContributorCurrent User Role Beginner
Hi,

I'm trying to use PMM started from AMI, but keep encounter issues with that.

we've started with m4.2xlarge instance. Added 50 mysql instances ,linux:metrics mysql:metrics mysql:queries (via slow log ), and 4 ProxySQL instances for monitoring. note: added LimitsNOFILE=65536 to prometheus service to get rid of "Too many open files" error.

The issue is: on the second day I've noticed the PMM WEB UI became unresponsive. it apperas the LA is too high:

#uptime
14:01:33 up 1 day, 5:23, 1 user, load average: 24.00, 23.94, 23.39

however atop shows cpu's are in idle, see screenshot attached,


dmesg shows me
[105465.363093] XFS (dm-4): metadata I/O error: block 0x23776f0 ("xfs_buf_iodone_callback_error") error 5 numblks 8
[105466.633074] XFS: Failing async write: 2984 callbacks suppressed
[105466.635627] XFS (dm-4): Failing async write on buffer block 0x23776f0. Retrying async write.


I see the disks are not full:

# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda1 xfs 128G 2.7G 126G 3% /
devtmpfs devtmpfs 16G 0 16G 0% /dev
tmpfs tmpfs 16G 0 16G 0% /dev/shm
tmpfs tmpfs 16G 649M 15G 5% /run
tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/mapper/DataVG-DataLV xfs 205G 28G 178G 14% /srv
tmpfs tmpfs 3.2G 0 3.2G 0% /run/user/0
tmpfs tmpfs 3.2G 0 3.2G 0% /run/user/1001
# df -h -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda1 128M 55K 128M 1% /
devtmpfs 4.0M 338 4.0M 1% /dev
tmpfs 4.0M 1 4.0M 1% /dev/shm
tmpfs 4.0M 397 4.0M 1% /run
tmpfs 4.0M 16 4.0M 1% /sys/fs/cgroup
/dev/mapper/DataVG-DataLV 205M 525K 205M 1% /srv
tmpfs 4.0M 1 4.0M 1% /run/user/0



I've tried to reboot the server, but it got stuck. My admins rebooted it via AWS Console, but after reboot LVM volume with /srv had disappeared:

# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda1 xfs 128G 2.9G 126G 3% /
devtmpfs devtmpfs 16G 0 16G 0% /dev
tmpfs tmpfs 16G 0 16G 0% /dev/shm
tmpfs tmpfs 16G 17M 16G 1% /run
tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup
tmpfs tmpfs 3.2G 0 3.2G 0% /run/user/1001

Comments

  • DmytroKhDmytroKh Contributor Current User Role Beginner
    I must say this is the second time we've encountered the issue with lost /srv volume disappeared after reboot on high la (I wasn't on it on the first time, so didn't collect any data for post) . Also we've lost /srv on moving from t2 to m4 instance , and to another vpc, via image clone feature.

    My admins decided to run new instance with single volume:
    # df -hT
    Filesystem Type Size Used Avail Use% Mounted on
    /dev/xvda1 xfs 512G 25G 488G 5% /
    devtmpfs devtmpfs 16G 0 16G 0% /dev
    tmpfs tmpfs 16G 0 16G 0% /dev/shm
    tmpfs tmpfs 16G 137M 16G 1% /run
    tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup
    tmpfs tmpfs 3.2G 0 3.2G 0% /run/user/1001

    but smth went wrong, at least mysql setup incomplete: mysql root pass wasn't set, so I've found a temporary one from /var/log/mysql.log, and set root password to the one from /root/.my.cnf. next, I've found there is no orchestrator db and user, no user 'percona'@'localhost' and so on.
  • PERydellPERydell Entrant Legacy User Role Beginner
    This has happened to my PMM instance on EC2 a few times now also. Something seems pretty wrong where the instance can only run for about a week or two before it gets into this broken state.
  • martaymartay Entrant Current User Role Beginner
    I had the same issue with two pmm instances using the market place AMI. After approx. a week to 10 days I found each pmm instance would become unresponsive. I found that the volume (/srv) has been thin provisioned in lvm and very quickly runs out of metadata space basically causing the /srv volume to become unwriteable. You can check the metadata space usage with lvs -a
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.