Not Able to connect to QAN API.

We recently upgraded both pmm-server and pmm-client. I tried using all the latest versions and I can’t get Query Analytics dashboard. The other two metrics work great except for QAN.

Config:
16:26:48 # pmm-admin list
pmm-admin 1.13.0

PMM Server | pmm.us.devhealtheintent.net
Client Name | HI_PXC_PRM1
Client Address | ...*
Service Manager | linux-upstart


SERVICE TYPE NAME LOCAL PORT RUNNING DATA SOURCE OPTIONS


mysql:queries HI_PXC_PRM1 - YES pmm:@unix(/var/run/mysql/mysql.sock) query_source=slowlog, query_examples=true, slow_log_rotation=true, retain_slow_logs=1
linux:metrics HI_PXC_PRM1 42000 YES -
mysql:metrics HI_PXC_PRM1 42002 YES pmm:
@unix(/var/run/mysql/mysql.sock)

Error on client side (/var/log/pmm-mysql-queries-0.log) :

Version: percona-qan-agent 1.0.5

Basedir: /usr/local/percona/qan-agent

PID: 31751

API: pmm.us.devhealtheintent.net/qan-api

UUID: 13df0ec1fb8b407953be016765051c20

2019/02/20 15:47:04.855978 main.go:153: Starting agent…
2019/02/20 15:47:04.857152 main.go:321: Agent is ready
2019/02/20 15:47:04.879164 main.go:194: API is ready
{UUID: CollectFrom: Interval:60 ExampleQueries:0xc4200bffd0 MaxSlowLogSize:1073741824 SlowLogRotation:0xc4200bffd1 RetainSlowLogs:0xc4200bffd8 Start: Stop: ReportLimit:200}
{UUID:bf9896a3a9364a36552d148b3ea64ebe CollectFrom:slowlog Interval:60 ExampleQueries:0xc4200bffd0 MaxSlowLogSize:1073741824 SlowLogRotation:0xc4200bffd1 RetainSlowLogs:0xc4200bffd8 Start: Stop: ReportLimit:200}
2019/02/20 15:49:10.878195 WARNING data-sender Removed qan_1550699340036594312 because API returned 400: json.Unmarshal(data): unexpected end of JSON input

Error on Server Side (qan-api log):
[0m2019/02/20 19:17:43.520 127.0.0.1 200 234.896µs GET /
[0m[0;37mINFO 2019/02/20 19:17:43 init.go:234: auth agent: not found: 21ec5d7b364740cb43e2d1f15081b355
[0m2019/02/20 19:17:43.527 127.0.0.1 404 3.363947ms GET /agents/21ec5d7b364740cb43e2d1f15081b355
[0m[0;37mINFO 2019/02/20 19:17:45 init.go:234: auth agent: not found: 8ce09a49d50d4f8c637e895ab3f34b85
[0m2019/02/20 19:17:45 server.go:2923: http: response.WriteHeader on hijacked connection
2019/02/20 19:17:45 server.go:2923: http: response.Write on hijacked connection
[0;31mERROR 2019/02/20 19:17:45 results.go:336: Response write failed: http: connection has been hijacked
[0m2019/02/20 19:17:45.989 127.0.0.1 404 4.914743ms WS /agents/8ce09a49d50d4f8c637e895ab3f34b85/log
[0m2019/02/20 19:17:46.539 127.0.0.1 200 303.829µs GET /
[0m[0;37mINFO 2019/02/20 19:17:46 init.go:234: auth agent: not found: 21ec5d7b364740cb43e2d1f15081b355
[0m2019/02/20 19:17:46.545 127.0.0.1 404 2.645553ms GET /agents/21ec5d7b364740cb43e2d1f15081b355

Hi avinashgudapati

Without getting into the troubleshooting, have you tried removing and re-adding the service?

pmm-admin rm mysql:queries
pmm-admin add mysql:queries

Yes multiple times. I have been working on this for last 3 days. I tried removing, uninstalling pmm-client and pmm-server still the issue persists.

avinashgudapati pls 1) update your client, it’s not the latest. (you have pmm-admin 1.13.0 and we just released 1.17.1 , but at least 1.17.0)
2) Also make sure your MySQL server configures correctly based on the recommendation you see in Query Analytics dashboard
3) run pmm-admin check-network , it can provide some additional information about the problem
4) try to use Performance Schema instead of slow log for monitoring
5) see [url]Percona Monitoring and Management and check other logs

btw, What Server you using? Docker ? AMI? and is it password protected?

roma.novikov Thanks for the above Steps. I tried all the changes you suggested.

  1. I updated the client and server version both to 1.17.*
  2. Tried using Performance schema instead of slowlog
  3. Went through the documentation and made sure everything is configured correctly.
  4. We are using Docker for hosting server and we have something in PROD that uses pmm-server version 1.2.0 and everything works great on that. So, It shouldn’t be anything with network or authentication I guess.

Check-network info. The error details after the latest update are same as I mentioned in original post.

10:50:27 # pmm-admin list
pmm-admin 1.17.0

PMM Server | ****
Client Name | HI_PXC_PRM1
Client Address | ***
Service Manager | linux-upstart

-------------- ------------ ----------- -------- ----------------------------------------- ---------------------------------------------
SERVICE TYPE NAME LOCAL PORT RUNNING DATA SOURCE OPTIONS
-------------- ------------ ----------- -------- ----------------------------------------- ---------------------------------------------
mysql:queries HI_PXC_PRM1 - YES root:***@unix(/var/run/mysql/mysql.sock) query_source=perfschema, query_examples=true
linux:metrics HI_PXC_PRM1 42000 YES -
mysql:metrics HI_PXC_PRM1 42002 YES root:***@unix(/var/run/mysql/mysql.sock)

10:50:31 # pmm-admin check-network
PMM Network Status

Server Address | *****
Client Address | *****

* System Time
NTP Server (0.pool.ntp.org) | 2019-02-21 16:50:49 +0000 UTC
PMM Server | 2019-02-21 16:50:49 +0000 GMT
PMM Client | 2019-02-21 10:50:49 -0600 CST
PMM Server Time Drift | OK
PMM Client Time Drift | OK
PMM Client to PMM Server Time Drift | OK

* Connection: Client --> Server
-------------------- -------
SERVER SERVICE STATUS
-------------------- -------
Consul API OK
Prometheus API OK
Query Analytics API OK

Connection duration | 1.606978ms
Request duration | 12.226706ms
Full round trip | 13.833684ms


* Connection: Client <-- Server
-------------- ------------ -------------------- ------- ---------- ---------
SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
-------------- ------------ -------------------- ------- ---------- ---------
linux:metrics HI_PXC_PRM1 10.190.158.18:42000 OK YES -
mysql:metrics HI_PXC_PRM1 10.190.158.18:42002 OK YES - 

one more important thing I need to add is the dashboard shows up sometimes but fails mostly. Dashboard showing up sometimes is completely random.

pmm-mysql-queries-0.log:

# Version: percona-qan-agent 1.0.5
# Basedir: /usr/local/percona/qan-agent
# PID: 25527
# API: pmm.us.******.net/qan-api
# UUID: 76332b491e5148b1697f5df90307fad6
2019/02/21 17:14:40.855941 main.go:153: Starting agent...
2019/02/21 17:14:40.856668 main.go:321: Agent is ready
2019/02/21 17:14:40.882059 main.go:194: API is ready
{UUID: CollectFrom: Interval:60 ExampleQueries:0xc4200b3e70 MaxSlowLogSize:1073741824 SlowLogRotation:0xc4200b3e71 RetainSlowLogs:0xc4200b3e78 Start:[] Stop:[] ReportLimit:200}
{UUID:05ee8b546bf74a166c2a1d01ef963332 CollectFrom:slowlog Interval:60 ExampleQueries:0xc4200b3e70 MaxSlowLogSize:1073741824 SlowLogRotation:0xc4200b3e71 RetainSlowLogs:0xc4200b3e78 Start:[] Stop:[] ReportLimit:200}
2019/02/21 17:16:46.876516 WARNING data-sender Removed qan_1550790960085447396 because API returned 400: json.Unmarshal(data): unexpected end of JSON input
2019/02/21 17:17:49.873148 WARNING data-sender Removed qan_1550791020053053785 because API returned 400: json.Unmarshal(data): unexpected end of JSON input
2019/02/21 17:18:52.875432 WARNING data-sender Removed qan_1550791080033521360 because API returned 400: json.Unmarshal(data): unexpected end of JSON input
2019/02/21 17:19:55.875173 WARNING data-sender Removed qan_1550791140044002073 because API returned 400: json.Unmarshal(data): unexpected end of JSON input
2019/02/21 17:20:58.873694 WARNING data-sender Removed qan_1550791200041398419 because API returned 400: json.Unmarshal(data): unexpected end of JSON input
2019/02/21 17:22:01.874818 WARNING data-sender Removed qan_1550791260077403456 because API returned 400: json.Unmarshal(data): unexpected end of JSON input
2019/02/21 17:22:01.875340 WARNING data-sender Removed qan_1550791320024656449 because API returned 400: json.Unmarshal(data): invalid character 'k' looking for beginning of value

qan-api log:

[0m2019/02/22 00:19:01 server.go:2923: http: response.WriteHeader on hijacked connection
2019/02/22 00:19:01 server.go:2923: http: response.Write on hijacked connection
[0;31mERROR 2019/02/22 00:19:01 results.go:336: Response write failed: http: connection has been hijacked
[0m2019/02/22 00:19:01.633 127.0.0.1 404 3.255622ms WS /agents/202623ee9cfb42f260976f9d30b9ff8f/log
[0m2019/02/22 00:19:03.834 127.0.0.1 200 216.391µs GET /
[0m[0;37mINFO 2019/02/22 00:19:03 init.go:234: auth agent: not found: 21ec5d7b364740cb43e2d1f15081b355
[0m2019/02/22 00:19:03.842 127.0.0.1 404 4.184936ms GET /agents/21ec5d7b364740cb43e2d1f15081b355
[0m[0;37mINFO 2019/02/22 00:19:04 init.go:234: auth agent: not found: 202623ee9cfb42f260976f9d30b9ff8f
[0m2019/02/22 00:19:04 server.go:2923: http: response.WriteHeader on hijacked connection
2019/02/22 00:19:04 server.go:2923: http: response.Write on hijacked connection
[0;31mERROR 2019/02/22 00:19:04 results.go:336: Response write failed: http: connection has been hijacked
[0m2019/02/22 00:19:04.463 127.0.0.1 404 3.438965ms WS /agents/202623ee9cfb42f260976f9d30b9ff8f/data
[0m2019/02/22 00:19:06.854 127.0.0.1 200 279.458µs GET /
[0m[0;37mINFO 2019/02/22 00:19:06 init.go:234: auth agent: not found: 21ec5d7b364740cb43e2d1f15081b355
[0m2019/02/22 00:19:06.861 127.0.0.1 404 2.668192ms GET /agents/21ec5d7b364740cb43e2d1f15081b355
[0m[0;37mINFO 2019/02/22 00:19:07 init.go:234: auth agent: not found: 202623ee9cfb42f260976f9d30b9ff8f
[0m2019/02/22 00:19:07 server.go:2923: http: response.WriteHeader on hijacked connection
2019/02/22 00:19:07 server.go:2923: http: response.Write on hijacked connection
[0;31mERROR 2019/02/22 00:19:07 results.go:336: Response write failed: http: connection has been hijacked
[0m2019/02/22 00:19:07.481 127.0.0.1 404 3.036275ms WS /agents/202623ee9cfb42f260976f9d30b9ff8f/data
[0m2019/02/22 00:19:09.873 127.0.0.1 200 407.57µs GET /
[0m[0;37mINFO 2019/02/22 00:19:09 init.go:234: auth agent: not found: 21ec5d7b364740cb43e2d1f15081b355
[0m2019/02/22 00:19:09.880 127.0.0.1 404 3.639746ms GET /agents/21ec5d7b364740cb43e2d1f15081b355
[0m2019/02/22 00:19:12.890 127.0.0.1 200 197.626µs GET /
[0m[0;37mINFO 2019/02/22 00:19:12 init.go:234: auth agent: not found: 21ec5d7b364740cb43e2d1f15081b355
[0m2019/02/22 00:19:12.897 127.0.0.1 404 3.936918ms GET /agents/21ec5d7b364740cb43e2d1f15081b355
[0m2019/02/22 00:19:15.909 127.0.0.1 200 250.592µs GET /
[0m[0;37mINFO 2019/02/22 00:19:15 init.go:234: auth agent: not found: 21ec5d7b364740cb43e2d1f15081b355
[0m2019/02/22 00:19:15.916 127.0.0.1 404 2.825135ms GET /agents/21ec5d7b364740cb43e2d1f15081b355

roma.novikov Michael Coburn Can someone please help me with the issue? it is blocking us from our PROD deployment.

Hi avinashgudapati

Sorry for the delay, we’re discussing this internally with the Engineering team, and someone will get back to you shortly. Thanks for your patience,

I’m stepping in for avinashgudapati while he’s out of the office so I’ll be closely following this thread for updates.

Michael Coburn do you have any updates for us on this issue?

roma.novikov , or Michael Coburn , I was curious how things were going with the engineering team?

Hi all,

This is being handled over ticket #251704.