ERROR qan-analyzer driver: bad connection

Hi!
After starting “pmm-admin start --all” everything works fine, for a few time. After 2-3 hours approximately, Query Analytics stop to show data. We have 2 servers for now, and they both have the same problem.
In /var/log/pmm-mysql-queries-0.log we can see:

[mysql] 2017/01/30 20:29:09 packets.go:59: unexpected EOF
[mysql] 2017/01/30 20:29:09 packets.go:386: busy buffer
2017/01/30 20:29:09.538585 ERROR qan-analyzer-1342087d driver: bad connection
2017/01/30 20:30:00.001315 WARNING qan-analyzer-1342087d-worker Interval out of sequence: got 186, expected 182
2017/01/30 20:30:08.733680 ERROR qan-analyzer-1342087d-worker Got class twice: registry_dev2 e1fecd37897b7375d76acae411ed0f5b
2017/01/30 20:31:00.011321 WARNING qan-analyzer-1342087d Skipping interval ‘187 2017-01-30 18:30:00 UTC to 2017-01-30 18:31:00 UTC (0-0)’ because interval ‘186 2017-01-30 18:29:00 UTC to 2017-01-30 18:30:00 UTC (0-0)’ is still being parsed
2017/01/30 20:32:00.005720 WARNING qan-analyzer-1342087d Skipping interval ‘188 2017-01-30 18:31:00 UTC to 2017-01-30 18:32:00 UTC (0-0)’ because interval ‘186 2017-01-30 18:29:00 UTC to 2017-01-30 18:30:00 UTC (0-0)’ is still being parsed
2017/01/30 20:33:00.001729 WARNING qan-analyzer-1342087d Skipping interval ‘189 2017-01-30 18:32:00 UTC to 2017-01-30 18:33:00 UTC (0-0)’ because interval ‘186 2017-01-30 18:29:00 UTC to 2017-01-30 18:30:00 UTC (0-0)’ is still being parsed
[mysql] 2017/01/30 20:33:28 packets.go:59: unexpected EOF
[mysql] 2017/01/30 20:33:28 packets.go:386: busy buffer
[mysql] 2017/01/30 20:33:28 connection.go:307: invalid connection
2017/01/30 20:33:28.778027 ERROR qan-analyzer-1342087d driver: bad connection
2017/01/30 20:34:00.003532 WARNING qan-analyzer-1342087d-worker Interval out of sequence: got 190, expected 187
2017/01/30 20:34:04.971836 ERROR qan-analyzer-1342087d-worker Got class twice: registry_dev2 e1fecd37897b7375d76acae411ed0f5b

After “pmm-admin restart --all” there is no errors for some time, Query Analytics works fine. But not for long. Same errors.

Server and clients was reinstalled by documentation 3 times. No effect.
How can i fix this problem?

Hi Aleksey,

Did you have MySQL restarts in this period? https://github.com/go-sql-driver/mysql/issues/449

No. pmm client lost connection every hour now. Approximately. All server work without restart few weeks

looks strange,
is pmm-admin ran locally on database host?
can you share output? (just replace IPs by x.x.x.x)

[root@mysql ~]# pmm-admin list
pmm-admin 1.0.7

PMM Server | 1.1.1.1:82
Client Name | mysql.db
Client Address | 1.1.1.9
Service Manager | linux-systemd


SERVICE TYPE NAME LOCAL PORT RUNNING DATA SOURCE OPTIONS


mysql:queries mysql.db - YES root:@unix(/var/lib/mysql/mysql.sock) query_source=perfschema
linux:metrics mysql.db 42000 YES -
mysql:metrics mysql.db 42002 YES root:
@unix(/var/lib/mysql/mysql.sock)

And yes, pmm-admin runs locally on database host

Do you use MariaDB+Galera? https://jira.mariadb.org/browse/MDEV-10812

No, only percona server:
Percona-Server-server-57-5.7.16-10.1.el7.x86_64
Percona-Server-shared-compat-57-5.7.16-10.1.el7.x86_64
Percona-Server-client-57-5.7.16-10.1.el7.x86_64
percona-zabbix-templates-1.1.7-2.noarch
percona-toolkit-2.2.20-1.noarch
percona-release-0.1-4.noarch
Percona-Server-shared-57-5.7.16-10.1.el7.x86_64

What is your OS?
Is selinux disabled?
do you have any periodic jobs?
Is you systems has enough max connection count?

CentOS 7 updated
selinux disabled:

[root@mysql ~]# sestatus
SELinux status: disabled

periodic jobs:

  1. retarting qan(until driver: bad connection will fix)

[root@mysql ~]# crontab -l
*/30 * * * * /usr/sbin/pmm-admin restart mysql:queries > /dev/null

  1. /etc/cron.hourly/logrotate.mysql

#!/bin/sh

/usr/sbin/logrotate /etc/logrotate.d/mysql
EXITVALUE=$?
if [ $EXITVALUE != 0 ]; then
/usr/bin/logger -t logrotate “ALERT exited abnormally with [$EXITVALUE]”
fi
exit 0

  1. /etc/logrotate.d/mysql

/var/log/mysqld.log {
notifempty
size 100M
rotate 100
missingok
compress
olddir /var/log/mysql_old_logs
postrotate
touch /var/log/mysqld.log
chown mysql:mysql /var/log/mysqld.log
chmod 600 /var/log/mysqld.log
mysqladmin flush-logs
endscript
}

  1. all other centos7 system jobs without changes

Is you systems has enough max connection count?

net.ipv4.ip_local_port_range = 32768 60999
net.core.somaxconn = 128

Or how can i find out it?

  1. maybe you have any periodic task which use mysql on your application server and runs every 3 hours?
  2. can you share “MySQL Connections” graph?
  1. Well, we have big counts of tasks, that use server, but all from other servers, no local tasks. Database creation, dropping, DDL, DML etc. A lot.
  2. Images by url: https://www.dropbox.com/sh/geb4muklv…iKF4jVRva?dl=0
    Sorry, i cant overcome image uploading on this forum

can you try to switch QAN Collect source from “Performance Schema” to “Slow Log”?

Ok, i will.
By the way, 3 hours ago i had tuned new server, with same settings. And what? The same problem.

So, with “–query-source slowlog” everything fine. No problems since yesterday morning, it is about 20 hours.
Another problem - slowquery.log grow very fast, 890M from time of switching from performance_schema. Can i rotate it without loss of collected data in PMM?

yes, you can, QAN should work fine, if any issues please let me know.

don’t use https://www.percona.com/doc/percona-…_rotation.html

“This feature is currently considered BETA quality.”
Can i use it in production environment, or it is better for now to use logrotate?

It’s beta, so up to you.
Log rotate is also should work OK. Example configuration https://www.percona.com/blog/2013/04/18/rotating-mysql-slow-logs-safely/

Thanks. But you have to fix problem with performance_schame in QAN :slight_smile:

after some research
please don’t use slowlog_rotation in PS, it is BETA and looks like it has some strange handling inside PMM.
please write immediately if you have any issues with log rotate.