For a while now the client nodes but also pmm-agent on the server has been unable to properly connect to the server. Im not sure when the problem started but it has been a problem since at least Sep 27 since that is how far back the log goes. Both the clients and the server is up-to-date with 2.22.0 and the problem existed before that as far as I remember.
The server is deployed from the OVA into kvm and resides on the same private network through a secondary interface as the client so there is no proxy or similar between the clients and server.
This is the pmm-agent log from one of the nodes:
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.103+02:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. component=main
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Using /usr/local/percona/pmm2/exporters/node_exporter component=main
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter component=main
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter component=main
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter component=main
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter component=main
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Using /usr/local/percona/pmm2/exporters/rds_exporter component=main
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Using /usr/local/percona/pmm2/exporters/azure_exporter component=main
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Using /usr/local/percona/pmm2/exporters/vmagent component=main
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Starting… component=client
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Connecting to https://admin:***@192.168.0.221:443/ … component=client
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.105+02:00] Starting local API server on http://127.0.0.1:7777/ … component=local-server/JSON
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.106+02:00] Started. component=local-server/JSON
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.121+02:00] Connected to 192.168.0.221:443. component=client
Oct 06 08:06:22 node.example.com pmm-agent[12652]: INFO[2021-10-06T08:06:22.121+02:00] Establishing two-way communication channel … component=client
Oct 06 08:06:27 node.example.com pmm-agent[12652]: ERRO[2021-10-06T08:06:27.110+02:00] Failed to establish two-way communication channel: context canceled. component=client
There is a different error when starting up the server. Attached that part of the log pmm-agent.log (180.2 KB)
Thanks for posting your question here. I have added a comment in the Slack thread to help anyone else to find their way here. I will take a look at the attached log and the OVF version and see if anything stands out.
In the meantime, please could you check and confirm the following:
Communication is possible from the PMM agent nodes to the server, e.g.
curl -X GET --dump-header /dev/stdout https://<your PMM server address>/v1/readyz
Updating the certificate helped. Since the initial connection succeeded I didn’t think the certificate would have been an issue. An improved error message would certainly help in this case, by telling you why it canceled.
Thanks for your help
I answered too fast. It actually only helped with the error message on some nodes but not on all.
Network performance shouldn’t be an issue since the main traffic goes on other interfaces and the server load is between 0.1 and 0.4 with most of the CPU is used by victoriametrics