Pmm-client making too many connections to the postgres database

Hi All,

We are long term user of Percona-pmm. Recently we reinstalled the pmm-server and upgraded our clients on all the server.
After the upgrade, we see that on our Master Postgres server, we see broken graphs for all postgres related parameters. Other graphs are fine and working well.

When I checked on the master server, the client log was showing -

Dec 30 06:25:11 pmm-agent[109654]: #033[36mINFO#033[0m[2022-12-30T06:25:11.412+01:00] time=“2022-12-30T06:25:11+01:00” level=error msg=“error encoding and sending metric family: write tcp 127.0.0.1:42000->127.0.0.1:37206: write: broken pipe\n” source=“log.go:184” #033[36magentID#033[0m=/agent_id/080d7fb3-8b64-41c6-b22d-d94c642d5b76 #033[36mcomponent#033[0m=agent-process #033[36mtype#033[0m=postgres_exporter
Dec 30 06:25:11 pmm-agent[109654]: message repeated 20 times: [ #033[36mINFO#033[0m[2022-12-30T06:25:11.412+01:00] time=“2022-12-30T06:25:11+01:00” level=error msg=“error encoding and sending metric family: write tcp 127.0.0.1:42000->127.0.0.1:37206: write: broken pipe\n” source=“log.go:184” #033[36magentID#033[0m=/agent_id/080d7fb3-8b64-41c6-b22d-d94c642d5b76 #033[36mcomponent#033[0m=agent-process #033[36mtype#033[0m=postgres_exporter]
Dec 30 06:25:11 pmm-agent[109654]: #033[36mINFO#033[0m[2022-12-30T06:25:11.413+01:00] time=“2022-12-30T06:25:11+01:00” level=error msg=“error encoding and sending metric family: write tcp 127.0.0.1:42000->127.0.0.1:37206: write: broken pipe\n” source=“log.go:184” #033[36magentID#033[0m=/agent_id/080d7fb3-8b64-41c6-b22d-d94c642d5b76 #033[36mcomponent#033[0m=agent-process #033[36mtype#033[0m=postgres_exporter

Tried to upgrade, but somehow even after having latest repo in the server, the apt is not fetching the latest version. I tried -
installing the latest repo, apt clean and reinstall as well.
2nd problem is, I see pmm-client is making crazy amount of connections to the Postgres instance. As high as 157.

Can you please tell me on how to limit the connections and what can be the possible cause of broken pipe ?

PMM Server:
URL : https://x.x.x.x:443/
Version: 2.33.0

PMM Client:
Connected : true
Time drift : -2.512813ms
Latency : 768.103µs
Connection uptime: 100
pmm-admin version: 2.24.0
pmm-agent version: 2.24.0
Agents:
/agent_id/ postgresql_pgstatements_agent Running 0
/agent_id/5e2a8701-e04b-49d6-a5a2-d node_exporter Running 42000
/agent_id/ postgres_exporter Running 42001
/agent_id/ vmagent Running 42002

1 Like

Hello Aniruddha,

Let me try to help here and start with the last issue you reported, the high connection number from PMM to PG. This is a known issue that we are still working on to get fixed and the current workaround is to limit the PMM user with a max number connections as recommended in the documentation here [1]:

1. Optionally, you can also set up a connection limit (only if the user is not a SUPERUSER):

ALTER USER pmm CONNECTION LIMIT 10;

While this isn’t the final solution it will mitigate the issue.

The broken pipe issue isn’t clear to me only checking those log messages. Still, it seems to be related to either firewall or NAT translation because it seems to be trying to send the metrics through local ports in the loopback address 127.0.0.1, which is usually done because of NAT or firewall redirect. Have you changed anything on those hosts? Maybe a new OS install? Can you also share what is the OS name and version? Is your DB behind a firewall or does it need any proxy, NAT, or any network redirect to access the PMM server? How about the OS metrics on those nodes, are ty being collected by PMM?

[1] PostgreSQL - Percona Monitoring and Management

Regards,

Charly Batista

2 Likes

Thank you so much for your reply @Charly_Batista

Looks like we got hit by this bug :grinning:
Unfortunately, pmm user that we created is super user, so the connection limit is not getting honored.
Additional information about the behavior that we observed, connections were mostly firing the select version() command on each database and connection was then moved to idle state and looks like the connection count was in proportion to number of databases. We have 41. :grinning:
I tried to downgrade the pmm-server & pmm-client, this somehow atleast mitigated the connection issue but I now see unusal number of loopback connections on the server.

Netstat output (few lines from it):
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:35530 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:37094 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:35966 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:37184 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:37366 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:37578 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:36110 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:36788 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:36214 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:51200 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:51050 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:50958 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:49052 localhost:postgresql TIME_WAIT
tcp 0 0 localhost:51228 localhost:postgresql TIME_WAIT

PT-summary (hide IPs) :

Network Connections

Connections from remote IP addresses
PMM-server 8
127.0.0.1 1250
Connections to local IP addresses
local-server 70
127.0.0.1 1250
Connections to top 10 local ports
42001 3
42002 3
54574 1
54576 1
54578 1
54580 1
55210 1
55218 1
59166 1
States of connections
ESTABLISHED 70
LISTEN 15
TIME_WAIT 1250

If you see, the localhost/loopback connection count is huge.
Other answers :
Have you changed anything on those hosts?

recently we just reinstalled OS of pmm-server (changed from centos to Ubuntu 22.04) and reinstalled pmm. No change on postgres servers.

Can you also share what is the OS name and version?

PMM-server : Ubuntu 22.04
PMM-Clients : Ubuntu 16.04 & ubuntu 18.04

Is your DB behind a firewall or does it need any proxy, NAT, or any network redirect to access the PMM server?

PMM-server in one datacenter & 3 servers in other datacenter. 1 server in same data center. for cross data center it has to go through firewall & for same data center it is not going through firewall (different VLANs). But I will check this.

How about the OS metrics on those nodes, are ty being collected by PMM?

OS metrics are being collected by PMM and was working fine. No issues on this. Only the postgres is having issue.

I tried to change the --metrics-mode=pull as well but no use. the tcp connections are still high. Postgres connection issue is resolved after downgrading to version 2.29.

One question, with --metrics-mode=push (default), how the communication is happening ? I believe still on port 443 right?
Do i need to open any more port from pmm-client —> pmm-server? I tried to find the information on this but didn’t got much help there.

======
Service type Service name Address and port Service ID
PostgreSQL z8034-postgresql 127.0.0.1:5432 /service_id/62fb49aa-288f-4511-a9af-4ef78019bed7

Agent type Status Metrics Mode Agent ID Service ID Port
pmm_agent Connected /agent_id/74ca417a-4237-483a-bdef-d1be7bb03eb1 0
node_exporter Running pull /agent_id/c1556a8a-0279-4e8a-94e9-4b68652608fe 42001
postgres_exporter Running pull /agent_id/31b2db0d-6f07-4b2b-aa01-94b17275f68e /service_id/62fb49aa-288f-4511-a9af-4ef78019bed7 42002
postgresql_pgstatements_agent Running /agent_id/e87869ef-b8f1-4fd0-bc09-70450b60ef27 /service_id/62fb49aa-288f-4511-a9af-4ef78019bed7 0
vmagent Running push /agent_id/2e6daf9e-6e77-4cff-a2ad-dc883480df52 42000

root:~# pmm-admin status
Agent ID: /agent_id/74ca417a-4237-483a-bdef-d1be7bb03eb1
Node ID : /node_id/a45df6d6-f2a9-41c0-b2fc-0a68385654f1

PMM Server:
URL : https://pmm-server:443/
Version: 2.29.1

PMM Client:
Connected : true
Time drift : -2.299105ms
Latency : 452.463µs
pmm-admin version: 2.29.1
pmm-agent version: 2.29.1
Agents:
/agent_id/2e6daf9e-6e77-4cff-a2ad-dc883480df52 vmagent Running 42000
/agent_id/31b2db0d-6f07-4b2b-aa01-94b17275f68e postgres_exporter Running 42002
/agent_id/c1556a8a-0279-4e8a-94e9-4b68652608fe node_exporter Running 42001
/agent_id/e87869ef-b8f1-4fd0-bc09-70450b60ef27 postgresql_pgstatements_agent Running 0

I see some more errors in the log when i restarted the pmm-client. Maybe they are from last installation ? :

Jan 6 14:00:08 pmm-agent[22216]: #033[33mWARN#033[0m[2023-01-06T14:00:08.025+01:00] Failed to send StateChanged request. #033[33mcomponent#033[0m=client
Jan 6 14:00:08 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:08.038+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client
Jan 6 14:00:08 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:08.957+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client
Jan 6 14:00:10 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:10.445+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client
Jan 6 14:00:12 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:12.462+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client
Jan 6 14:00:15 pmm-agent[22216]: #033[31mERRO#033[0m[2023-01-06T14:00:15.497+01:00] Failed to establish two-way communication channel: No Agent with ID “/agent_id/8197b98c-fdf1-469b-80b7-f3f475fbdb84”. #033[31mcomponent#033[0m=client

2 Likes

We also upgraded to the latest postres_exporter and upstream does miss this fix:

So we also missed it in 2.33 during upgrade, it was fixed in [PMM-7958] Databases cannot be deleted while Postgres is being monitored - Percona JIRA for 2.34: PMM-7958 postgres databases cannot be deleted while monitored by rishat-ishbulatov · Pull Request #90 · percona/postgres_exporter · GitHub

1 Like

Make sense. We will wait for now to have the new release.
Do you have any probable time when this will be released ? @Denys_Kondratenko
Thank you again.

it is planned for Jan 24, Preview release is for Jan 17.

You can see the announcement for the preview release here: Open source database blog: free content to learn about … | Percona Community
Like Percona Monitoring and Management 2.33 preview release | Percona Community .

So you could take the preview release and let us know if it is fixed.

Hello aniruddha.deshpande, how many databases you have in your PG instance? It was identified that the resent version of PG exporter can consume too many connections when a monitored instance has multiple databases.
We might introduce an option to control how many connections will be used and whether you want to reestablish or persist a connection.

There is a tracking issue for that.
https://jira.percona.com/browse/PMM-11341

aniruddha.deshpande, how do you connect PG to PMM? Are you using CLI or GUI?