Got an errore reading / writing communication packets

inoma · May 31, 2019, 8:40am

Hi,

i have a 5 node xtradb cluster (4 node + 1 arbitrator).
After upgrading to the lastest version i have a lot of error in the error log

Aborted connection 1952 to db: ‘unconnected’ user: ‘root’ host: ‘localhost’ (Got an error writing communication packets)
Aborted connection 1980 to db: ‘unconnected’ user: ‘root’ host: ‘localhost’ (Got an error reading communication packets)
wsrep: failed to report las committed -110 (timeout)

i cannot find any network related issue and all nodes are on the same subnet with a very low latency

any idea about something to check?

inoma · May 31, 2019, 8:42am

this is the pt-summary output

Percona Toolkit System Summary Report

Processor

Processors | physical = 1, cores = 8, virtual = 8, hyperthreading = no
Speeds | 8x2099.978
Models | 8xIntel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
Caches | 8x15360 KB

Memory

Mounted Filesystems

Filesystem Size Used Type Opts Mountpoint
/dev/sda1 178G 41% ext4 rw,relatime,errors=remount-ro,data=ordered /
/dev/sdc1 13T 66% ext4 rw,noatime,nodiratime,errors=remount-ro,data=ordered /var/lib/mysql
tmpfs 1,5G 0% tmpfs rw,nosuid,nodev /run/user/0
tmpfs 1,5G 0% tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k /run/user/0
tmpfs 1,5G 0% tmpfs rw,nosuid,relatime,size=3085664k,mode=755 /run/user/0
tmpfs 1,5G 0% tmpfs rw,nosuid,nodev,relatime,size=1542832k,mode=700 /run/user/0
tmpfs 1,5G 0% tmpfs ro,nosuid,nodev,noexec,mode=755 /run/user/0
tmpfs 3,0G 1% tmpfs rw,nosuid,nodev /run
tmpfs 3,0G 1% tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k /run
tmpfs 3,0G 1% tmpfs rw,nosuid,relatime,size=3085664k,mode=755 /run
tmpfs 3,0G 1% tmpfs rw,nosuid,nodev,relatime,size=1542832k,mode=700 /run
tmpfs 3,0G 1% tmpfs ro,nosuid,nodev,noexec,mode=755 /run
tmpfs 5,0M 0% tmpfs rw,nosuid,nodev /run/lock
tmpfs 5,0M 0% tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k /run/lock
tmpfs 5,0M 0% tmpfs rw,nosuid,relatime,size=3085664k,mode=755 /run/lock
tmpfs 5,0M 0% tmpfs rw,nosuid,nodev,relatime,size=1542832k,mode=700 /run/lock
tmpfs 5,0M 0% tmpfs ro,nosuid,nodev,noexec,mode=755 /run/lock
tmpfs 7,4G 0% tmpfs rw,nosuid,nodev /dev/shm
tmpfs 7,4G 0% tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k /dev/shm
tmpfs 7,4G 0% tmpfs rw,nosuid,relatime,size=3085664k,mode=755 /dev/shm
tmpfs 7,4G 0% tmpfs rw,nosuid,nodev,relatime,size=1542832k,mode=700 /dev/shm
tmpfs 7,4G 0% tmpfs ro,nosuid,nodev,noexec,mode=755 /dev/shm
tmpfs 7,4G 0% tmpfs rw,nosuid,nodev /sys/fs/cgroup
tmpfs 7,4G 0% tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k /sys/fs/cgroup
tmpfs 7,4G 0% tmpfs rw,nosuid,relatime,size=3085664k,mode=755 /sys/fs/cgroup
tmpfs 7,4G 0% tmpfs rw,nosuid,nodev,relatime,size=1542832k,mode=700 /sys/fs/cgroup
tmpfs 7,4G 0% tmpfs ro,nosuid,nodev,noexec,mode=755 /sys/fs/cgroup
udev 10M 0% devtmpfs rw,relatime,size=10240k,nr_inodes=1926397,mode=755 /dev

Disk Schedulers And Queue Size

sda | [cfq] 128
sdb | [cfq] 128
sdc | [cfq] 128

Disk Partioning

Device Type Start End Size
============ ==== ========== ========== ==================
/dev/sda Disk 193273528320
/dev/sda1 Part 2048 377485311 0
/dev/sdb Disk 15032385536
/dev/sdb1 Part 2048 29358079 0
/dev/sdc Disk 14293651161088
/dev/sdc1 Part 2048 27917285375 0

Kernel Inode State

dentry-state | 31528 19480 45 0 0 0
file-nr | 1984 0 1540893
inode-nr | 86673 56832

LVM Volumes

Unable to collect information

LVM Volume Groups

Unable to collect information

RAID Controller

Controller | No RAID controller detected

Network Config

FIN Timeout | 60
Port Range | 61000

Interface Statistics

interface rx_bytes rx_packets rx_errors tx_bytes tx_packets tx_errors
========= ========= ========== ========== ========== ========== ==========
lo 1250000 10000 0 1250000 10000 0
eth0 500000000000 500000000 0 60000000000 250000000 0
eth1 300000000000 900000000 0 9000000000000 400000000 0

Top Processes

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8637 root 20 0 41196 13312 3476 S 37,2 0,1 567:50.08 node_expor+
8680 root 20 0 53968 29504 4024 S 37,2 0,2 947:07.50 mysqld_exp+
33 root 20 0 0 0 0 S 18,6 0,0 274:13.03 ksoftirqd/5
25143 mysql 20 0 66,682g 5,536g 832760 S 18,6 37,6 32:53.22 mysqld
3 root 20 0 0 0 0 S 12,4 0,0 339:51.83 ksoftirqd/0
43 root 20 0 0 0 0 S 12,4 0,0 326:40.74 ksoftirqd/7
41817 root 20 0 25616 2900 2468 R 6,2 0,0 0:00.01 top
1 root 20 0 28976 4968 2908 S 0,0 0,0 0:30.09 systemd
2 root 20 0 0 0 0 S 0,0 0,0 0:00.22 kthreadd

Notable Processes

PID OOM COMMAND
691 -17 sshd

Simplified and fuzzy rounded vmstat (wait please)

procs —swap-- -----io---- —system---- --------cpu--------
r b si so bi bo ir cs us sy il wa st
1 0 0 0 700 225 2 2 5 2 89 4 0
0 0 0 0 1500 2500 3000 7000 11 5 75 9 0
1 1 150 0 1000 3500 2000 4000 8 2 85 5 0
0 0 0 0 700 1500 2250 4500 10 2 87 1 0
3 0 0 0 0 2500 1500 3500 6 1 89 4 0

Memory mamagement

Transparent huge pages are enabled.

The End

xkatmai · June 30, 2019, 2:51am

i have the same problem. happened after the update to 5.7.25. latest version still has the issue.

xkatmai · July 25, 2019, 4:38am

did you ever find a fix to this inoma ?

xkatmai · August 17, 2019, 4:48am

i have no idea what fixed this, i haven’t made any config changes. what happened was that 1 node failed, so i made an image, then terminated the ec2 instance and restored the ami to a new ec2 machine. then i saw the errors gone on that machine. then i just went ahead and killed another node to see if it has the same behavior and it did.

long story short - i terminated and rebuilt all my nodes without making any percona changes and the error magically went away.

no idea what caused this and no idea what fixed it. i feel left in the dark

Topic		Replies	Views
Got an error reading communication packets Percona XtraDB Cluster 5.x	0	599	June 30, 2019
All Node is crash!!!! Percona XtraDB Cluster 5.x	0	1119	February 23, 2014
Why Nodes 2 and 3 doesn't start with new version 5.5.27 Percona XtraDB Cluster 5.x	1	623	September 12, 2012
All nodes in the cluster becomes inaccessible Percona XtraDB Cluster 5.x	9	5470	July 31, 2014
Cluster Node crached with strange error Percona XtraDB Cluster 5.x	1	738	October 12, 2013