pmm2-client - Failed to establish two-way communication channel

We are having issues connecting pmm2-client with pmm-server ( pmm2 ). Everything was working working from March 2020 till 29Th September 5PM but then all of sudden all

the pmm clients metrics stopped and on client logs are showing failed to connect to pmm server with below error

Error Log

Oct 8 04:51:07 n01-nsy1 pmm-agent: #033[36mINFO#033[0m[2020-10-08T04:51:07.085+00:00] Connected to pmm.xxxxxxxx.com:443. #033[36mcomponent#033[0m=client

Oct 8 04:51:07 n01-nsy1 pmm-agent: #033[36mINFO#033[0m[2020-10-08T04:51:07.085+00:00] Establishing two-way communication channel … #033[36mcomponent#033[0m=client

Oct 8 04:51:07 n01-nsy1 pmm-agent: #033[37mDEBU#033[0m[2020-10-08T04:51:07.085+00:00] Sending message (4 bytes): id:1 ping:<> . #033[37mcomponent#033[0m=channel

Oct 8 04:51:07 n01-nsy1 pmm-agent: #033[37mDEBU#033[0m[2020-10-08T04:51:07.090+00:00] Closing with error: rpc error: code = Unknown desc = : HTTP status code 464; transport: missing content-type field

Oct 8 04:51:07 n01-nsy1 pmm-agent: failed to receive message

Oct 8 04:51:07 n01-nsy1 pmm-agent: github.com/percona/pmm-agent/client/channel.(*Channel).runReceiver

Oct 8 04:51:07 n01-nsy1 pmm-agent: /tmp/go/src/github.com/percona/pmm-agent/client/channel/channel.go:199

Oct 8 04:51:07 n01-nsy1 pmm-agent: runtime.goexit

Oct 8 04:51:07 n01-nsy1 pmm-agent: /usr/local/go/src/runtime/asm_amd64.s:1373 #033[37mcomponent#033[0m=channel

Oct 8 04:51:07 n01-nsy1 pmm-agent: #033[37mDEBU#033[0m[2020-10-08T04:51:07.090+00:00] Exiting receiver goroutine. #033[37mcomponent#033[0m=channel

Oct 8 04:51:07 n01-nsy1 pmm-agent: #033[31mERRO#033[0m[2020-10-08T04:51:07.090+00:00] Failed to establish two-way communication channel: : HTTP status code 464; transport: missing content-type field. #033[31mcomponent#033[0m=client

Oct 8 04:51:07 n01-nsy1 pmm-agent: #033[37mDEBU#033[0m[2020-10-08T04:51:07.090+00:00] Connection closed.


pmm-admin config --server-insecure-tls --server-url=‘https://admin:xxxxxxx@pmm.xxxxx:443’ --force --debug

DEBUG 2020-10-08 04:50:32.690101797Z: Running: pmm-agent --server-address=pmm.xxxxx:443 --server-username=admin --server-password=xxxxxxx --server-insecure-tls --debug setup --force 58.x.x.x generic dev-iboss01-sysdb01-n01-nsy1DEBUG 2020-10-08 04:50:34.816381163Z: Result: &commands.configResult{Warning:"", Output:“Checking local pmm-agent status…\npmm-agent is running.\nRegistering pmm-agent on PMM Server…\nRegistered.\nConfiguration file /usr/local/percona/pmm2/config/pmm-agent.yaml updated.\nReloading pmm-agent configuration…\nConfiguration reloaded.\nChecking local pmm-agent status…\npmm-agent is running.”}

DEBUG 2020-10-08 04:50:34.816405248Z: Error:

Checking local pmm-agent status…

pmm-agent is running.

Registering pmm-agent on PMM Server…

Registered.

Configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml updated.

Reloading pmm-agent configuration…

Configuration reloaded.

Checking local pmm-agent status…

pmm-agent is running.

[root@n01-nsy1 ~]#

[root@n01-nsy1 ~]# pmm-admin status

Failed to get PMM Server parameters from local pmm-agent: pmm-agent is running, but not set up.

Please run pmm-admin config with --server-url flag.


Client is able to telnet to pmm server via port 443.

pmm server is able to telnet client via ports 4200x

Even tried upgrading the pmm server and client to the latest version 2.10.1 but still the agent is failing to connect to the server.

Can somebody please help me on this?

No idea if this is something you’re encountering or not but I just wasted an entire afternoon on a CentOS 8 system (emphasis on ENTIRE) because I was getting communication blocked like you described and it ended up being a weird combination of selinux and firewalld (which I didn’t even realize was running).

I could do the same things you could:

telnet from pmm-client to pmm-server on TCP/443

telnet from pmm-server to pmm-client on TCP/42000 and 42001

but I wasn’t getting metrics and was getting a “no route to host” error on the prometheus targets page (https:///prometheus/targets)

I ended up just killing selinux and disabling the firewalld service (this is a test system, not at all advocating you do that if this is production) but give that a look. Maybe look at “uptime” on the system and see if that corresponds with a kernel update or some other restart-required event?

If that isn’t it are there any unhealthy targets on your prometheus targets page you could share the error message for?

In my case, the pmm-agent on the client itself is failing

"Oct 21 11:43:39 sonar-vmhost02-mysql-esy1 pmm-agent: #033[36mINFO#033[0m[2020-10-21T11:43:39.193+11:00] Connected to pmm.mnfgroup.limited:443. #033[36mcomponent#033[0m=client

Oct 21 11:43:39 sonar-vmhost02-mysql-esy1 pmm-agent: #033[36mINFO#033[0m[2020-10-21T11:43:39.193+11:00] Establishing two-way communication channel … #033[36mcomponent#033[0m=client

Oct 21 11:43:39 sonar-vmhost02-mysql-esy1 pmm-agent: #033[31mERRO#033[0m[2020-10-21T11:43:39.197+11:00] Failed to establish two-way communication channel: : HTTP status code 464; transport: missing content-type field. #033[31mcomponent#033[0m=client "

On prometheus targets page , the error is as below

“Get “<a href=“http://59.86.168.141:42000/metrics?collect%5B%5D=custom_query.hr&collect%5B%5D=global_status&collect%5B%5D=info_schema.innodb_metrics&collect%5B%5D=standard.go&collect%5B%5D=standard.process”:”>http://X.X.X.X:42000/metrics?collect%5B%5D=custom_query.hr&collect%5B%5D=global_status&collect%5B%5D=info_schema.innodb_metrics&collect%5B%5D=standard.go&collect%5B%5D=standard.process”: context deadline exceeded"

“Get “<a href=“http://59.86.168.141:42000/metrics?collect%5B%5D=auto_increment.columns&collect%5B%5D=binlog_size&collect%5B%5D=custom_query.lr&collect%5B%5D=engine_tokudb_status&collect%5B%5D=global_variables&collect%5B%5D=heartbeat&collect%5B%5D=info_schema.clientstats&collect%5B%5D=info_schema.innodb_tablespaces&collect%5B%5D=info_schema.tables&collect%5B%5D=info_schema.tablestats&collect%5B%5D=info_schema.userstats&collect%5B%5D=perf_schema.eventsstatements&collect%5B%5D=perf_schema.file_instances&collect%5B%5D=perf_schema.indexiowaits&collect%5B%5D=perf_schema.tableiowaits”:”>http://X.X.X.X:<a href=“http://59.86.168.141:42000/metrics?collect%5B%5D=auto_increment.columns&collect%5B%5D=binlog_size&collect%5B%5D=custom_query.lr&collect%5B%5D=engine_tokudb_status&collect%5B%5D=global_variables&collect%5B%5D=heartbeat&collect%5B%5D=info_schema.clientstats&collect%5B%5D=info_schema.innodb_tablespaces&collect%5B%5D=info_schema.tables&collect%5B%5D=info_schema.tablestats&collect%5B%5D=info_schema.userstats&collect%5B%5D=perf_schema.eventsstatements&collect%5B%5D=perf_schema.file_instances&collect%5B%5D=perf_schema.indexiowaits&collect%5B%5D=perf_schema.tableiowaits”:”>:42000/metrics?collect%5B%5D=auto_increment.columns&collect%5B%5D=binlog_size&collect%5B%5D=custom_query.lr&collect%5B%5D=engine_tokudb_status&collect%5B%5D=global_variables&collect%5B%5D=heartbeat&collect%5B%5D=info_schema.clientstats&collect%5B%5D=info_schema.innodb_tablespaces&collect%5B%5D=info_schema.tables&collect%5B%5D=info_schema.tablestats&collect%5B%5D=info_schema.userstats&collect%5B%5D=perf_schema.eventsstatements&collect%5B%5D=perf_schema.file_instances&collect%5B%5D=perf_schema.indexiowaits&collect%5B%5D=perf_schema.tableiowaits": context deadline exceeded"

I have setup my pmm-server as below

PMM Server(EC2 instance) -----ALB( http -->https—>ec2) —R53(Alias- PMM URL)

Running the pmm agent using PMM Server IP works fine but using URL its throwing error

I think the issue here is with the ALB and the lack of native gRPC support. What you can try, to overcome this issue is to:

  • Create a new target-group using "Protocol": "HTTPS", "ProtocolVersion": "GRPC", "Port": 443,
  • Register your ec2 instance in the newly created target-group with "Port": 443
  • Create a new rule in your existing ALB listener. Configure it so that is send traffic to your newly created target group, only if matching the following Content-Type headers:
    • application/grpc+proto
    • application/grpc

Please write back if you need more details on setting up this, or the issue is being solved.

Cheers

1 Like