Thank you so much for your reply @Charly_Batista
Looks like we got hit by this bug
Unfortunately, pmm user that we created is super user, so the connection limit is not getting honored.
Additional information about the behavior that we observed, connections were mostly firing the select version() command on each database and connection was then moved to idle state and looks like the connection count was in proportion to number of databases. We have 41.
I tried to downgrade the pmm-server & pmm-client, this somehow atleast mitigated the connection issue but I now see unusal number of loopback connections on the server.
Netstat output (few lines from it):
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:35530 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:37094 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:35966 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:37184 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:37366 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:37578 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:36110 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:36788 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:36214 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:51200 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:51050 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:50958 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:49052 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:51228 localhost:postgresql TIME_WAIT
PT-summary (hide IPs) :
Network Connections
Connections from remote IP addresses
PMM-server 8
127.0.0.1 1250
Connections to local IP addresses
local-server 70
127.0.0.1 1250
Connections to top 10 local ports
42001 3
42002 3
54574 1
54576 1
54578 1
54580 1
55210 1
55218 1
59166 1
States of connections
ESTABLISHED 70
LISTEN 15
TIME_WAIT 1250
If you see, the localhost/loopback connection count is huge.
Other answers :
Have you changed anything on those hosts?
recently we just reinstalled OS of pmm-server (changed from centos to Ubuntu 22.04) and reinstalled pmm. No change on postgres servers.
Can you also share what is the OS name and version?
PMM-server : Ubuntu 22.04
PMM-Clients : Ubuntu 16.04 & ubuntu 18.04
Is your DB behind a firewall or does it need any proxy, NAT, or any network redirect to access the PMM server?
PMM-server in one datacenter & 3 servers in other datacenter. 1 server in same data center. for cross data center it has to go through firewall & for same data center it is not going through firewall (different VLANs). But I will check this.
How about the OS metrics on those nodes, are ty being collected by PMM?
OS metrics are being collected by PMM and was working fine. No issues on this. Only the postgres is having issue.
I tried to change the --metrics-mode=pull as well but no use. the tcp connections are still high. Postgres connection issue is resolved after downgrading to version 2.29.
One question, with --metrics-mode=push (default), how the communication is happening ? I believe still on port 443 right?
Do i need to open any more port from pmm-client —> pmm-server? I tried to find the information on this but didn’t got much help there.
======
Service type Service name Address and port Service ID
PostgreSQL z8034-postgresql 127.0.0.1:5432 /service_id/62fb49aa-288f-4511-a9af-4ef78019bed7
Agent type Status Metrics Mode Agent ID Service ID Port
pmm_agent Connected /agent_id/74ca417a-4237-483a-bdef-d1be7bb03eb1 0
node_exporter Running pull /agent_id/c1556a8a-0279-4e8a-94e9-4b68652608fe 42001
postgres_exporter Running pull /agent_id/31b2db0d-6f07-4b2b-aa01-94b17275f68e /service_id/62fb49aa-288f-4511-a9af-4ef78019bed7 42002
postgresql_pgstatements_agent Running /agent_id/e87869ef-b8f1-4fd0-bc09-70450b60ef27 /service_id/62fb49aa-288f-4511-a9af-4ef78019bed7 0
vmagent Running push /agent_id/2e6daf9e-6e77-4cff-a2ad-dc883480df52 42000
root:~# pmm-admin status
Agent ID: /agent_id/74ca417a-4237-483a-bdef-d1be7bb03eb1
Node ID : /node_id/a45df6d6-f2a9-41c0-b2fc-0a68385654f1
PMM Server:
URL : https://pmm-server:443/
Version: 2.29.1
PMM Client:
Connected : true
Time drift : -2.299105ms
Latency : 452.463µs
pmm-admin version: 2.29.1
pmm-agent version: 2.29.1
Agents:
/agent_id/2e6daf9e-6e77-4cff-a2ad-dc883480df52 vmagent Running 42000
/agent_id/31b2db0d-6f07-4b2b-aa01-94b17275f68e postgres_exporter Running 42002
/agent_id/c1556a8a-0279-4e8a-94e9-4b68652608fe node_exporter Running 42001
/agent_id/e87869ef-b8f1-4fd0-bc09-70450b60ef27 postgresql_pgstatements_agent Running 0
I see some more errors in the log when i restarted the pmm-client. Maybe they are from last installation ? :
Jan 6 14:00:08 pmm-agent[22216]: #033[33mWARN#033[0m[2023-01-06T14:00:08.025+01:00] Failed to send StateChanged request. #033[33mcomponent#033[0m=client
Jan 6 14:00:08 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:08.038+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client
Jan 6 14:00:08 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:08.957+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client
Jan 6 14:00:10 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:10.445+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client
Jan 6 14:00:12 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:12.462+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client
Jan 6 14:00:15 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:15.497+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client