Problems with qan in 1.8.0

After update from 1.7.0 to 1.8.0 some db instances couldn’t sent data to qan.

:~# pmm-admin list
pmm-admin 1.8.0

PMM Server | pmm.qa.com
Client Name | master.db.com
Client Address | 172.*.*.*
Service Manager | linux-systemd

-------------- ------------------------------------ ----------- -------- ------------------------------------------ ------------------------------------------
SERVICE TYPE NAME LOCAL PORT RUNNING DATA SOURCE OPTIONS
-------------- ------------------------------------ ----------- -------- ------------------------------------------ ------------------------------------------
mysql:queries master.db.com - YES pmm:***@unix(/var/run/mysqld/mysqld.sock) query_source=slowlog, query_examples=true
linux:metrics master.db.com 42000 YES -
mysql:metrics master.db.com 42002 YES pmm:***@unix(/var/run/mysqld/mysqld.sock)
~# pmm-admin check-network
PMM Network Status

Server Address | pmm.qa.com
Client Address | 172.*.*.*

* System Time
NTP Server (0.pool.ntp.org) | 2018-03-03 13:58:34 +0000 UTC
PMM Server | 2018-03-03 13:58:34 +0000 GMT
PMM Client | 2018-03-03 13:58:34 +0000 UTC
PMM Server Time Drift | OK
PMM Client Time Drift | OK
PMM Client to PMM Server Time Drift | OK

* Connection: Client --> Server
-------------------- -------
SERVER SERVICE STATUS
-------------------- -------
Consul API OK
Prometheus API OK
Query Analytics API OK

Connection duration | 800.78µs
Request duration | 1.69178ms
Full round trip | 2.49256ms


* Connection: Client <-- Server
-------------- ------------------------------------ --------------------- ------- ---------- ---------
SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
-------------- ------------------------------------ --------------------- ------- ---------- ---------
linux:metrics master.db.com 172.*.*.*:42000 OK YES -
mysql:metrics master.db.com 172.*.*.*:42002 OK YES -

But in grafana I see

Also when I try to remove mysql:queries for add it again I got error:

~# pmm-admin rm mysql:queries
Error removing MySQL queries master.db.com: timeout 10s waiting on agent to connect to API.

[HR][/HR]

Second issue doesn’t connected with 1.8.0.
In fingerprint of query I see wrong schema detection. For example:
db_server 1: has user schema with order table.
db_server 2: has message schema with table mail

in QAN I am checking user server. In fingerprint I see

select from message.order ...

but there are no schema message on user server.

Everything else works ok, Explain, table structure all looks correct.

Hi Stateros , were you able to determine why some hosts stopped sending metrics after upgrade to 1.8.0?
Can you please share the contents from /var/log/pmm-mysql-metrics* for an affected host so we can see why it is failing?

Regarding schema - yes we have identified this as a bug, but haven’t yet identified a release. Please follow this ticket and we welcome your commentary! thanks
https://jira.percona.com/browse/PMM-2266

Michael Coburn , I don’t have log files like you wrote. And I had been updated pmm-server and all clients to 1.8.1 at the release day

less /var/log/
anaconda/ lastlog prometheus.log.2
btmp mysql.log purge-qan-data.log
consul.log nginx/ qan-api.log
createdb.log nginx.log rhsm/
createdb2.log node_exporter.log supervisor/
createdb3.log orchestrator.log tallylog
cron.log pmm-manage.log wtmp
dashboard-upgrade.log pmm-managed.log yum.log
grafana/ prometheus.log
grubby_prune_debug prometheus.log.1

Michael Coburn
Hi, I encountered the same error message in the grafana. Now,i describe what i did here,

I pulled the latest pmm and installed it according the document , the function is okay,

but i wanna move the template from sqlite3 to mysql, so i modified the grafana.ini like next:
···

Either “mysql”, “postgres” or “sqlite3”, it’s your choice

type = mysql
host = 127.0.0.1:3306
name = grafana
user = root

If the password contains # or ; you have to wrap it with trippel quotes. Ex “”"#password;"""

password =
···

and then,restart the pmm-server in docker

next,I chose the data source and imported the dashbords in grafana web

oh, no!!! the query analytics could not work incorrectly!!!

pls help me,thx

谢谢!!!

Hi Stateros , sorry for not being clear - the files in /var/log/pmm-* will exist on any node that has pmm-client package installed and if it is running exporters.

Hi Ezail ,

I suspect your issue is different than what Stateros is experiencing, and I suggest you open a separate thread for your request.

​​​​​​​Can you share in that new thread the actual error message you are receiving? Thanks!

I suspect your issue is different than what Stateros is experiencing, and I suggest you open a separate thread for your request.

​​​​​​​Can you share in that new thread the actual error message you are receiving? Thanks!
[/QUOTE]

thx 4 ur support, I hv solved my issue,
while i change the template from sqlite3 to mysql,i must add a new datasource named api-qan(mysql) which used to stored the template,i did not do this step before

but i hv met another issue ,i will open a separate thread to discuss it

thx!!!

Hi Michael Coburn Here 2 log files with mysql metrics and queries for 2 days

pmm-mysql-queries.txt (17.5 KB)

pmm-mysql-metrics.txt (36 KB)

After update to 1.9.0 problem still exists

Hi Stateros

Looking at mysql-metrics, it appears the exporter is starting up every minute, but I don’t see where the exporter is being killed.

time="2018-03-27T14:00:05Z" level=info msg="Listening on db_server_ip:42002" source="mysqld_exporter.go:393"
time="2018-03-27T14:30:05Z" level=info msg="Starting mysqld_exporter (version=1.8.1, branch=master, revision=74d5373dceed55bf9cb15a932fa0bedd8996e251)" source="mysqld_exporter.go:286"

Can you check if you have more than one exporter running?

ps -ef | grep mysqld_exporter

and mysql-queries, we can see the agent starting, then immediately getting terminated:

2018/03/27 11:30:04.822084 main.go:194: API is ready
2018/03/27 11:30:04.887335 main.go:349: Caught terminated signal, shutting down
2018/03/27 11:30:04.887374 main.go:375: Stopping QAN...

I would suggest you remove the exporter/qan agent and re-add it:

pmm-admin remove mysql
pmm-admin add mysql

Michael Coburn

~# ps -ef | grep mysqld_exporter
root 23777 1 0 13:30 ? 00:00:00 /bin/sh -c /usr/local/percona/pmm-client/mysqld_exporter -collect.auto_increment.columns=true -collect.binlog_size=true -collect.global_status=true -collect.global_variables=true -collect.info_schema.innodb_metrics=true -collect.info_schema.processlist=true -collect.info_schema.query_response_time=true -collect.info_schema.tables=true -collect.info_schema.tablestats=true -collect.info_schema.userstats=true -collect.perf_schema.eventswaits=true -collect.perf_schema.file_events=true -collect.perf_schema.indexiowaits=true -collect.perf_schema.tableiowaits=true -collect.perf_schema.tablelocks=true -collect.slave_status=true -web.listen-address=172.25.127.186:42002 -web.auth-file=/usr/local/percona/pmm-client/pmm.yml -web.ssl-cert-file=/usr/local/percona/pmm-client/server.crt -web.ssl-key-file=/usr/local/percona/pmm-client/server.key >> /var/log/pmm-mysql-metrics-42002.log 2>&1
root 23778 23777 3 13:30 ? 00:04:21 /usr/local/percona/pmm-client/mysqld_exporter -collect.auto_increment.columns=true -collect.binlog_size=true -collect.global_status=true -collect.global_variables=true -collect.info_schema.innodb_metrics=true -collect.info_schema.processlist=true -collect.info_schema.query_response_time=true -collect.info_schema.tables=true -collect.info_schema.tablestats=true -collect.info_schema.userstats=true -collect.perf_schema.eventswaits=true -collect.perf_schema.file_events=true -collect.perf_schema.indexiowaits=true -collect.perf_schema.tableiowaits=true -collect.perf_schema.tablelocks=true -collect.slave_status=true -web.listen-address=172.25.127.186:42002 -web.auth-file=/usr/local/percona/pmm-client/pmm.yml -web.ssl-cert-file=/usr/local/percona/pmm-client/server.crt -web.ssl-key-file=/usr/local/percona/pmm-client/server.key
root 27437 27409 0 15:47 pts/0 00:00:00 grep --color=auto mysqld_exporter

What should I do with it? I can’t kill any of processes. If I kill one of them, second also die.

It didn’t help. I still see in log:

# Version: percona-qan-agent 1.0.5
# Basedir: /usr/local/percona/qan-agent
# PID: 12490
# API: pmm.qa.com/qan-api
# UUID: 640466ec83f743e262efd49a6b6ec7f2
2018/04/16 06:00:05.393009 main.go:153: Starting agent...
2018/04/16 06:00:05.396324 main.go:321: Agent is ready
2018/04/16 06:00:05.423233 main.go:194: API is ready
2018/04/16 06:00:05.446720 main.go:349: Caught terminated signal, shutting down
2018/04/16 06:00:05.446758 main.go:375: Stopping QAN...
2018/04/16 06:00:05.448388 main.go:382: Waiting 2 seconds to flush agent log to API...
2018/04/16 06:00:07.448587 main.go:157: Agent has stopped
# Version: percona-qan-agent 1.0.5
# Basedir: /usr/local/percona/qan-agent
# PID: 12535
# API: pmm.qa.com/qan-api
# UUID: 640466ec83f743e262efd49a6b6ec7f2
2018/04/16 06:00:07.483234 main.go:153: Starting agent...
2018/04/16 06:00:07.492553 main.go:321: Agent is ready
2018/04/16 06:00:07.493524 main.go:194: API is ready

Hi Stateros

Are all your systems using systemd? Is there anything in the systemd logs that indicate why the binaries are being shut down?

Hi Stateros
Did you try to re-install pmm-client? Is it CentOS?

Hi nailya, no I use Ubuntu 16.04 on all pmm and DB servers. Sure, I tried to update, re-install. I even rebuild pmm-server

Hi Michael Coburn, checked with our OPS guys. Nothing suspicious.

Looks like after I have updated pmm to 1.10.0 no any issue.

Hi ,

i have the same issue withe the pmm 1.10.0 version , all it’s configured but it doesn’t work help me please