Pmm2-client 2.19 Failed to get PMM Agent status from local pmm-agent: pmm-agent is not connected to PMM Server

I’ve just upgraded a pmm2-client to 2.19, after that I was getting a ‘no data’ alert on the server. In the past, I’ve removed and reinstalled the client package and that’s resolved the issue.
This time, I removed and reinstalled the client package, re-configured the agent successfully but adding the service results in the error:

Failed to get PMM Agent status from local pmm-agent: pmm-agent is not connected to PMM Server, stating that the address on 7777 is already in use, but I can see that it’s only in use by the pmm2 client

Here’s the steps carried out (i’ve pulled together the output from a few session windows so the timestamps don’t always line up, but the order and outputs are accurate):


[root@[host] ~]# netstat -ln | grep "7777"
[root@[host] ~]# yum install pmm2-client
Last metadata expiration check: 2:01:28 ago on Tue 20 Jul 2021 08:40:30 AM CEST.
Dependencies resolved.
==================================================================================================================
 Package                  Architecture        Version                   Repository                           Size
==================================================================================================================
Installing:
 pmm2-client              x86_64              2.19.0-6.el8              percona-release-x86_64               43 M

Transaction Summary
==================================================================================================================
Install  1 Package

Total download size: 43 M
Installed size: 43 M
Is this ok [y/N]: y
Downloading Packages:
pmm2-client-2.19.0-6.el8.x86_64.rpm                                               8.6 MB/s |  43 MB     00:04
------------------------------------------------------------------------------------------------------------------
Total                                                                             8.6 MB/s |  43 MB     00:04
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                          1/1
  Running scriptlet: pmm2-client-2.19.0-6.el8.x86_64                                                          1/1
  Installing       : pmm2-client-2.19.0-6.el8.x86_64                                                          1/1
  Running scriptlet: pmm2-client-2.19.0-6.el8.x86_64                                                          1/1
  Verifying        : pmm2-client-2.19.0-6.el8.x86_64                                                          1/1

Installed:
  pmm2-client-2.19.0-6.el8.x86_64

Complete!
[root@[host] ~]# netstat -ln | grep "7777"
tcp        0      0 127.0.0.1:7777          0.0.0.0:*               LISTEN
[root@[host] ~]#  pmm-admin config --force --server-insecure-tls --server-url=https://[user]:[pass]@[server]:443 [client IP] generic Prod[host]
Checking local pmm-agent status...
pmm-agent is running.
Registering pmm-agent on PMM Server...
Registered.
Configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml updated.
Reloading pmm-agent configuration...
Configuration reloaded.
Checking local pmm-agent status...
pmm-agent is running.
[root@[host] ~]#  pmm-admin add mysql --debug --username=[dbuser] --password=[dbpass] --query-source=perfschema --service-name=Prod[host] --host=127.0.0.1 --port=3307 --environment=PROD
DEBUG 2021-07-20 08:37:36.945657542Z: POST /local/Status HTTP/1.1
Host: 127.0.0.1:7777
User-Agent: Go-http-client/1.1
Content-Length: 3
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{}

DEBUG 2021-07-20 08:37:36.947292995Z: HTTP/1.1 200 OK
Content-Length: 413
Content-Type: application/json
Date: Tue, 20 Jul 2021 08:37:36 GMT
Grpc-Metadata-Content-Type: application/grpc

{
  "agent_id": "/agent_id/65482cf6-0dd8-4a50-939c-ec772cc2105b",
  "runs_on_node_id": "",
  "server_info": {
    "url": "https://[user]:[pass]@[server]:443/",
    "insecure_tls": true,
    "connected": false,
    "version": "",
    "latency": null,
    "clock_drift": null
  },
  "agents_info": [
  ],
  "config_filepath": "/usr/local/percona/pmm2/config/pmm-agent.yaml",
  "agent_version": "2.19.0"
}
Failed to get PMM Server parameters from local pmm-agent: pmm-agent is not connected to PMM Server.

[root@[host] ~]# pmm-admin status
Failed to get PMM Agent status from local pmm-agent: pmm-agent is not connected to PMM Server.
[root@[host] ~]#  pmm-agent --debug --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml                      
INFO[2021-07-20T10:43:06.166+02:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml.  component=main
INFO[2021-07-20T10:43:06.167+02:00] Using /usr/local/percona/pmm2/exporters/node_exporter  component=main
INFO[2021-07-20T10:43:06.167+02:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter  component=main
INFO[2021-07-20T10:43:06.167+02:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter  component=main
INFO[2021-07-20T10:43:06.167+02:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter  component=main
INFO[2021-07-20T10:43:06.167+02:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter  component=main
INFO[2021-07-20T10:43:06.167+02:00] Using /usr/local/percona/pmm2/exporters/rds_exporter  component=main
INFO[2021-07-20T10:43:06.167+02:00] Using /usr/local/percona/pmm2/exporters/azure_exporter  component=main
INFO[2021-07-20T10:43:06.167+02:00] Using /usr/local/percona/pmm2/exporters/vmagent  component=main
DEBU[2021-07-20T10:43:06.167+02:00] Loaded configuration: &{ID:/agent_id/95d309df-72a3-4186-9ef8-b8ed1780174c ListenAddress:127.0.0.1 ListenPort:7777 Server:{Address:[ip address]:443 Username:[user] Password:[password] InsecureTLS:true WithoutTLS:false} Paths:{ExportersBase:/usr/local/percona/pmm2/exporters NodeExporter:/usr/local/percona/pmm2/exporters/node_exporter MySQLdExporter:/usr/local/percona/pmm2/exporters/mysqld_exporter MongoDBExporter:/usr/local/percona/pmm2/exporters/mongodb_exporter PostgresExporter:/usr/local/percona/pmm2/exporters/postgres_exporter ProxySQLExporter:/usr/local/percona/pmm2/exporters/proxysql_exporter RDSExporter:/usr/local/percona/pmm2/exporters/rds_exporter AzureExporter:/usr/local/percona/pmm2/exporters/azure_exporter VMAgent:/usr/local/percona/pmm2/exporters/vmagent TempDir:/tmp PTSummary:/usr/local/percona/pmm2/tools/pt-summary PTPgSummary:/usr/local/percona/pmm2/tools/pt-pg-summary PTMySqlSummary:/usr/local/percona/pmm2/tools/pt-mysql-summary PTMongoDBSummary:/usr/local/percona/pmm2/tools/pt-mongodb-summary SlowLogFilePrefix:} Ports:{Min:42000 Max:51999} Debug:true Trace:false Setup:{NodeType: NodeName: MachineID: Distro: ContainerID: ContainerName: NodeModel: Region: Az: Address: MetricsMode: DisableCollectors: CustomLabels: Force:false SkipRegistration:false}}  component=main
INFO[2021-07-20T10:43:06.168+02:00] Starting...                                   component=client
INFO[2021-07-20T10:43:06.168+02:00] Starting local API server on http://127.0.0.1:7777/ ...  component=local-server/JSON
DEBU[2021-07-20T10:43:06.168+02:00] Debug handlers:
        http://127.0.0.1:7777/debug/metrics
        http://127.0.0.1:7777/debug/vars
        http://127.0.0.1:7777/debug/requests
        http://127.0.0.1:7777/debug/events
        http://127.0.0.1:7777/debug/pprof  component=local-server/JSON
INFO[2021-07-20T10:43:06.168+02:00] Connecting to https://[user]:***@[host]:443/ ...  component=client
DEBU[2021-07-20T10:43:06.168+02:00] Starting gRPC server on http://127.0.0.1:42927/ ...  component=local-server/gRPC
DEBU[2021-07-20T10:43:06.168+02:00] Reflection and channelz are enabled.          component=local-server/gRPC
INFO[2021-07-20T10:43:06.171+02:00] Started.                                      component=local-server/JSON
PANI[2021-07-20T10:43:06.171+02:00] listen tcp 127.0.0.1:7777: bind: address already in use  component=local-server/JSON
panic: (*logrus.Entry) 0xc0002ef730

goroutine 157 [running]:
github.com/sirupsen/logrus.Entry.log(0xc0001becb0, 0xc00003c1e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /tmp/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:239 +0x2ea
github.com/sirupsen/logrus.(*Entry).Log(0xc0002f7340, 0x0, 0xc000b85f90, 0x1, 0x1)
        /tmp/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:268 +0xf0
github.com/sirupsen/logrus.(*Entry).Panic(0xc0002f7340, 0xc000b85f90, 0x1, 0x1)
        /tmp/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:306 +0x55
github.com/percona/pmm-agent/agentlocal.(*Server).runJSONServer.func2(0xc0002f7340, 0xc0002e2380)
        /tmp/go/src/github.com/percona/pmm-agent/agentlocal/agent_local.go:297 +0xf2
created by github.com/percona/pmm-agent/agentlocal.(*Server).runJSONServer
        /tmp/go/src/github.com/percona/pmm-agent/agentlocal/agent_local.go:294 +0xcf3

Run netstat -nltp to see what process is already listening on 7777

1 Like

Already checked that with the netstat in the CLI output, it’s the agent.

tcp 0 0 127.0.0.1:7777 0.0.0.0:* LISTEN 471054/pmm-agent

1 Like

Hi @Andrew2,
Since you have a thread dump, I suggest opening an issue at https://jira.percona.com for Percona Monitoring and Management.

1 Like

I left it for a day or so, and had intended running the same process to raise a ticket and it worked, but nothing to my knowledge had changed.
I’m now getting it on a different server which is a new install so I’ll use that and raise a ticket.

1 Like

After some more digging, I’ve identified the issue.
The port clash was a red herring, that was my own faulty for running pmm-agent with --debug without shutting it down first.

The underlying cause was that it was unable to establish ‘two way communication’ which, after a lot of digging around the network, seems to have been caused by the Docker daemon on the PMM server being run with selinux enabled.
As the host server is well within our local network, I wasn’t to concerned about docker running with selinux so disabled that and it appears to have resolved all the intermittent connection issues.

2 Likes

Awesome. Glad you were able to pinpoint the issue.

1 Like

Unfortunately it seems to have been a coincidence. Even with selinux in the docker instance disabled, I’m still getting occasional drops in metrics data.
One one box, I’ve enabled pmm-admin logging to /var/log and the logs are full of these when the issue arises.

INFO[2021-10-13T13:21:28.179+02:00] Connecting to https://admin:***@192.168.x.x:443/ ...  component=client
INFO[2021-10-13T13:21:28.286+02:00] Connected to 192.168.x.x:443.              component=client
INFO[2021-10-13T13:21:28.286+02:00] Establishing two-way communication channel ...  component=client
ERRO[2021-10-13T13:21:33.181+02:00] Failed to establish two-way communication channel: context canceled.  component=client

and on another, this:

INFO[2021-10-13T13:30:43.251+02:00] Connecting to https://admin:***@192.168.x.x:443/ ...  component=client
INFO[2021-10-13T13:30:43.358+02:00] Connected to 192.168.x.x:443.              component=client
INFO[2021-10-13T13:30:43.358+02:00] Establishing two-way communication channel ...  component=client
DEBU[2021-10-13T13:30:43.358+02:00] Sending message (4 bytes): id:1 ping:{}.      component=channel
DEBU[2021-10-13T13:30:48.256+02:00] Closing with error: rpc error: code = Canceled desc = context canceled
failed to receive message
github.com/percona/pmm-agent/client/channel.(*Channel).runReceiver
        /tmp/go/src/github.com/percona/pmm-agent/client/channel/channel.go:215
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1371  component=channel
DEBU[2021-10-13T13:30:48.256+02:00] Exiting receiver goroutine.                   component=channel
ERRO[2021-10-13T13:30:48.256+02:00] Failed to establish two-way communication channel: context canceled.  component=client
DEBU[2021-10-13T13:30:48.256+02:00] Connection closed.                            component=client

and pmm-admin status still just returns:

Failed to get PMM Agent status from local pmm-agent: pmm-agent is not connected to PMM Server.

If I stop the pmm-agent service for some time (I’ve left it to the following day), it seems to be fine which suggests something is throttling access somewhere, but I can’t identify where. Any ideas?

1 Like

Restarting the docker container seems to allow the 2-way communication to be re-established.

I’ll do some digging into networking and docker to see if I can find out where and why it seems to prevent communication some times.

1 Like

Still dealing with this, but I’ve not found anything new.

Periodically, around half my nodes will stop being reported, and those I enable logging on, show that it’s the ‘2 way communication’ error. I’ve not managed to find anything that the affected nodes have in common, but it does tend to be the same ones affected

Restarting the docker container restores the metrics.

1 Like

Here’s a few basic ideas that may trigger something (none necessarily PMM specific, just stuff I encountered over time)

Do you register your pmm clients to the server using IP or DNS? Have seen issues like this with out of sync dns servers (2 of 3 return one respond and 3rd returns no response).

I was running a system that had both selinux and firewalld on it so had to fix them both before everything worked as expected.

Another system had fail2ban running and not set up with our local network in exclude so a clumsy co worker would get a machine banned for 10 min and by the time I’d investigate, the issue would be gone.

Might also be worth looking at the logs on the PMM server side to see if that sheds light:
on your pmm server you can run docker exec -it pmm-server bash and look in /srv/logs. I’d start with pmm-managed.log at about the same time stamps to see if there’s corresponding errors or possibly the nginx.log since that’s the first thing your client really talks to when establishing connectivity to PMM server.

1 Like