Hi,
pmm-agent
dies and requires restart by sudo systemctl restart pmm-agent
Here is the log:
● pmm-agent.service - pmm-agent
Loaded: loaded (/lib/systemd/system/pmm-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2022-02-20 09:59:58 GMT; 6min ago
Main PID: 597288 (pmm-agent)
Tasks: 72 (limit: 19100)
Memory: 211.7M
CGroup: /system.slice/pmm-agent.service
├─597288 /usr/sbin/pmm-agent --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml
├─597320 /usr/local/percona/pmm2/exporters/vmagent -envflag.enable=true -httpListenAddr=127.0.0.1:42000 -loggerLevel=INFO -promscrape.config=/tmp/vm_agent/agent_id/0c3af199-5b2c-4383-a150-e23054826538/vmagentscrapecfg -remoteWrite.maxDiskUsagePerURL=1073741824 -remoteWrite.tlsInsecureSkipVerify=true -remoteWrite.tmpDataPath=/tmp/vmagent-temp-dir -remoteWrite.url=https://192.168.20.10:32043/victoriametrics/api/v1/write
├─597321 /usr/local/percona/pmm2/exporters/postgres_exporter --auto-discover-databases --collect.custom_query.hr --collect.custom_query.hr.directory=/usr/local/percona/pmm2/collectors/custom-queries/postgresql/high-resolution --collect.custom_query.lr --collect.custom_query.lr.directory=/usr/local/percona/pmm2/collectors/custom-queries/postgresql/low-resolution --collect.custom_query.mr --collect.custom_query.mr.directory=/usr/local/percona/pmm2/collectors/custom-queries/postgresql/medium-resolution --exclude-databases=template0,template1,postgres,cloudsqladmin,pmm-managed-dev,azure_maintenance --web.listen-address=:42001
└─597330 /usr/local/percona/pmm2/exporters/node_exporter --collector.bonding --collector.buddyinfo --collector.cpu --collector.diskstats --collector.entropy --collector.filefd --collector.filesystem --collector.hwmon --collector.loadavg --collector.meminfo --collector.meminfo_numa --collector.netdev --collector.netstat --collector.netstat.fields=^(.*_(InErrors|InErrs|InCsumErrors)|Tcp_(ActiveOpens|PassiveOpens|RetransSegs|CurrEstab|AttemptFails|OutSegs|InSegs|EstabResets|OutRsts|OutSegs)|Tcp_Rto(Algorithm|Min|Max)|Udp_(RcvbufErrors|SndbufErrors)|Udp(6?|Lite6?)_(InDatagrams|OutDatagrams|RcvbufErrors|SndbufErrors|NoPorts)|Icmp6?_(OutEchoReps|OutEchos|InEchos|InEchoReps|InAddrMaskReps|InAddrMasks|OutAddrMaskReps|OutAddrMasks|InTimestampReps|InTimestamps|OutTimestampReps|OutTimestamps|OutErrors|InDestUnreachs|OutDestUnreachs|InTimeExcds|InRedirects|OutRedirects|InMsgs|OutMsgs)|IcmpMsg_(InType3|OutType3)|Ip(6|Ext)_(InOctets|OutOctets)|Ip_Forwarding|TcpExt_(Listen.*|Syncookies.*|TCPTimeouts))$ --collector.processes --collector.standard.go --collector.standard.process --collector.stat --collector.textfile.directory.hr=/usr/local/percona/pmm2/collectors/textfile-collector/high-resolution --collector.textfile.directory.lr=/usr/local/percona/pmm2/collectors/textfile-collector/low-resolution --collector.textfile.directory.mr=/usr/local/percona/pmm2/collectors/textfile-collector/medium-resolution --collector.textfile.hr --collector.textfile.lr --collector.textfile.mr --collector.time --collector.uname --collector.vmstat --collector.vmstat.fields=^(pg(steal_(kswapd|direct)|refill|alloc)_(movable|normal|dma3?2?)|nr_(dirty.*|slab.*|vmscan.*|isolated.*|free.*|shmem.*|i?n?active.*|anon_transparent_.*|writeback.*|unstable|unevictable|mlock|mapped|bounce|page_table_pages|kernel_stack)|drop_slab|slabs_scanned|pgd?e?activate|pgpg(in|out)|pswp(in|out)|pgm?a?j?fault)$ --no-collector.arp --no-collector.bcache --no-collector.conntrack --no-collector.drbd --no-collector.edac --no-collector.infiniband --no-collector.interrupts --no-collector.ipvs --no-collector.ksmd --no-collector.logind --no-collector.mdadm --no-collector.mountstats --no-collector.netclass --no-collector.nfs --no-collector.nfsd --no-collector.ntp --no-collector.qdisc --no-collector.runit --no-collector.sockstat --no-collector.supervisord --no-collector.systemd --no-collector.tcpstat --no-collector.timex --no-collector.wifi --no-collector.xfs --no-collector.zfs --web.disable-exporter-metrics --web.listen-address=:42002
Feb 20 10:06:42 fuse pmm-agent[597288]: INFO[2022-02-20T10:06:42.763+00:00] time="2022-02-20T10:06:42Z" level=error msg="error encoding and sending metric family: write tcp 127.0.0.1:42001->127.0.0.1:39884: write: broken pipe\n" source="log.go:184" agentID=/agent_id/8963d9bb-ff9a-41c8-82b8-c8a0f4a0e8ce component=agent-process type=postgres_exporter
Feb 20 10:06:42 fuse pmm-agent[597288]: INFO[2022-02-20T10:06:42.763+00:00] time="2022-02-20T10:06:42Z" level=error msg="error encoding and sending metric family: write tcp 127.0.0.1:42001->127.0.0.1:39884: write: broken pipe\n" source="log.go:184" agentID=/agent_id/8963d9bb-ff9a-41c8-82b8-c8a0f4a0e8ce component=agent-process type=postgres_exporter
Feb 20 10:06:42 fuse pmm-agent[597288]: INFO[2022-02-20T10:06:42.763+00:00] time="2022-02-20T10:06:42Z" level=error msg="error encoding and sending metric family: write tcp 127.0.0.1:42001->127.0.0.1:39884: write: broken pipe\n" source="log.go:184" agentID=/agent_id/8963d9bb-ff9a-41c8-82b8-c8a0f4a0e8ce component=agent-process type=postgres_exporter
Feb 20 10:06:42 fuse pmm-agent[597288]: INFO[2022-02-20T10:06:42.763+00:00] time="2022-02-20T10:06:42Z" level=error msg="error encoding and sending metric family: write tcp 127.0.0.1:42001->127.0.0.1:39884: write: broken pipe\n" source="log.go:184" agentID=/agent_id/8963d9bb-ff9a-41c8-82b8-c8a0f4a0e8ce component=agent-process type=postgres_exporter
Feb 20 10:06:42 fuse pmm-agent[597288]: INFO[2022-02-20T10:06:42.763+00:00] time="2022-02-20T10:06:42Z" level=error msg="error encoding and sending metric family: write tcp 127.0.0.1:42001->127.0.0.1:39884: write: broken pipe\n" source="log.go:184" agentID=/agent_id/8963d9bb-ff9a-41c8-82b8-c8a0f4a0e8ce component=agent-process type=postgres_exporter
Feb 20 10:06:42 fuse pmm-agent[597288]: INFO[2022-02-20T10:06:42.763+00:00] time="2022-02-20T10:06:42Z" level=error msg="error encoding and sending metric family: write tcp 127.0.0.1:42001->127.0.0.1:39884: write: broken pipe\n" source="log.go:184" agentID=/agent_id/8963d9bb-ff9a-41c8-82b8-c8a0f4a0e8ce component=agent-process type=postgres_exporter
Feb 20 10:06:42 fuse pmm-agent[597288]: INFO[2022-02-20T10:06:42.763+00:00] time="2022-02-20T10:06:42Z" level=error msg="error encoding and sending metric family: write tcp 127.0.0.1:42001->127.0.0.1:39884: write: broken pipe\n" source="log.go:184" agentID=/agent_id/8963d9bb-ff9a-41c8-82b8-c8a0f4a0e8ce component=agent-process type=postgres_exporter
After restarting the pmm-agent
it starts to collect data again with no issues.
Any ideas how to fix it?
thanks
1 Like
Hi Fahad,
Is the issue reproducible? Does pmm-agent die if a postgresql service is removed and added again?
1 Like
Is the issue reproducible?
I didn’t do anything special. I just spun up the pmm-server
via docker
as per the docs.
Then started the postgres
service like this on the ubuntu server
running postgres
:
sudo pmm-admin add postgresql --username='pmm' --password='my password'
Does pmm-agent die if a postgresql service is removed and added again?
I tried removing it via this:
pmm-admin remove postgresql
pmm-admin remove postgresql /service_id/38deb42a-fb83-4f35-adf5-d11b783cef16
It gives the error:
Service with name "/service_id/38deb42a-fb83-4f35-adf5-d11b783cef16" not found.
So how do I remove and re-add?
I looked here but no luck.
1 Like
when I restart the agent. It starts to work again for another 12 hours or so
1 Like
Services can be removed from monitoring by service_name.
e.g.
pmm-admin remove postgresql myPostgresqlService1
1 Like
thanks. There you go:
➜ ~ pmm-admin list
Service type Service name Address and port Service ID
PostgreSQL fuse-postgresql 127.0.0.1:5432 /service_id/38deb42a-fb83-4f35-adf5-d11b783cef16
Agent type Status Metrics Mode Agent ID Service ID
pmm_agent Connected /agent_id/c244be73-6827-497c-b066-7aaa8ddcbad8
node_exporter Running push /agent_id/9eab03be-0ded-4c19-abba-766e5039ab57
postgres_exporter Running push /agent_id/8963d9bb-ff9a-41c8-82b8-c8a0f4a0e8ce /service_id/38deb42a-fb83-4f35-adf5-d11b783cef16
postgresql_pgstatements_agent Running /agent_id/963917ae-40f7-4bda-95cd-ccc09f0dd600 /service_id/38deb42a-fb83-4f35-adf5-d11b783cef16
vmagent Running push /agent_id/0c3af199-5b2c-4383-a150-e23054826538
Removing by:
➜ ~ pmm-admin remove postgresql fuse-postgresql
Service removed.
➜ ~ pmm-admin list
Service type Service name Address and port Service ID
Agent type Status Metrics Mode Agent ID Service ID
pmm_agent Connected /agent_id/c244be73-6827-497c-b066-7aaa8ddcbad8
node_exporter Running push /agent_id/9eab03be-0ded-4c19-abba-766e5039ab57
vmagent Running push /agent_id/0c3af199-5b2c-4383-a150-e23054826538
Agent still seem to be running:
sudo systemctl status pmm-agent
[sudo] password for fahadshery:
● pmm-agent.service - pmm-agent
Loaded: loaded (/lib/systemd/system/pmm-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-02-21 15:28:56 GMT; 2h 59min ago
Main PID: 1794130 (pmm-agent)
Tasks: 55 (limit: 19100)
Memory: 62.3M
CGroup: /system.slice/pmm-agent.service
├─1794130 /usr/sbin/pmm-agent --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml
├─1794164 /usr/local/percona/pmm2/exporters/node_exporter --collector.bonding --collector.buddyinfo --collector.cpu --collector.diskstats --collector.entropy --coll>
└─1930922 /usr/local/percona/pmm2/exporters/vmagent -envflag.enable=true -httpListenAddr=127.0.0.1:42000 -loggerLevel=INFO -promscrape.config=/tmp/vm_agent/agent_id>
Feb 21 18:26:12 fuse pmm-agent[1794130]: INFO[2022-02-21T18:26:12.327+00:00] 2022-02-21T18:26:12.327Z info VictoriaMetrics/lib/persistentqueue/fastqueue.go:59 >
Feb 21 18:26:12 fuse pmm-agent[1794130]: INFO[2022-02-21T18:26:12.328+00:00] 2022-02-21T18:26:12.327Z info VictoriaMetrics/app/vmagent/remotewrite/client.go:143 >
Feb 21 18:26:12 fuse pmm-agent[1794130]: INFO[2022-02-21T18:26:12.328+00:00] 2022-02-21T18:26:12.328Z info VictoriaMetrics/app/vmagent/main.go:112 started v>
Feb 21 18:26:12 fuse pmm-agent[1794130]: INFO[2022-02-21T18:26:12.328+00:00] 2022-02-21T18:26:12.328Z info VictoriaMetrics/lib/promscrape/scraper.go:96 read>
Feb 21 18:26:12 fuse pmm-agent[1794130]: INFO[2022-02-21T18:26:12.328+00:00] 2022-02-21T18:26:12.328Z info VictoriaMetrics/lib/httpserver/httpserver.go:82 s>
Now re-added the postgres service again. Will let you know how long it reports for the metrics:
sudo pmm-admin add postgresql --username='pmm' --password='my password'
1 Like
unfortunately the error returned. Here is the status
:
sudo systemctl status pmm-agent
● pmm-agent.service - pmm-agent
Loaded: loaded (/lib/systemd/system/pmm-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-02-21 15:28:56 GMT; 4h 31min ago
Main PID: 1794130 (pmm-agent)
Tasks: 79 (limit: 19100)
Memory: 196.6M
CGroup: /system.slice/pmm-agent.service
├─1794130 /usr/sbin/pmm-agent --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml
├─1794164 /usr/local/percona/pmm2/exporters/node_exporter --collector.bonding --collector.buddyinfo --collector.cpu --collector.diskstats --collector.entropy --coll>
├─1931050 /usr/local/percona/pmm2/exporters/postgres_exporter --auto-discover-databases --collect.custom_query.hr --collect.custom_query.hr.directory=/usr/local/per>
└─1931068 /usr/local/percona/pmm2/exporters/vmagent -envflag.enable=true -httpListenAddr=127.0.0.1:42000 -loggerLevel=INFO -promscrape.config=/tmp/vm_agent/agent_id>
Feb 21 20:00:41 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:41.006+00:00] time="2022-02-21T20:00:40Z" level=error msg="error encoding and sending metric family: write tcp 127>
Feb 21 20:00:41 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:41.006+00:00] time="2022-02-21T20:00:40Z" level=error msg="error encoding and sending metric family: write tcp 127>
Feb 21 20:00:41 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:41.006+00:00] time="2022-02-21T20:00:40Z" level=error msg="error encoding and sending metric family: write tcp 127>
Feb 21 20:00:41 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:41.006+00:00] time="2022-02-21T20:00:40Z" level=error msg="error encoding and sending metric family: write tcp 127>
Feb 21 20:00:41 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:41.006+00:00] time="2022-02-21T20:00:40Z" level=error msg="error encoding and sending metric family: write tcp 127>
Feb 21 20:00:41 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:41.006+00:00] time="2022-02-21T20:00:40Z" level=error msg="error encoding and sending metric family: write tcp 127>
Feb 21 20:00:41 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:41.006+00:00] time="2022-02-21T20:00:40Z" level=error msg="error encoding and sending metric family: write tcp 127>
Feb 21 20:00:41 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:41.006+00:00] time="2022-02-21T20:00:40Z" level=error msg="error encoding and sending metric family: write tcp 127>
Feb 21 20:00:41 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:41.006+00:00] time="2022-02-21T20:00:40Z" level=error msg="error encoding and sending metric family: write tcp 127>
Feb 21 20:00:45 fuse pmm-agent[1794130]: INFO[2022-02-21T20:00:45.599+00:00] 2022-02-21T20:00:45.599Z error VictoriaMetrics/lib/promscrape/scrapework.go:258 error when scraping "http://127.0.0.1:42003/metrics?collect%5B%5D=custom_query&collect%5B%5D=exporter&collect%5B%5D=standard.go&collect%5B%5D=standard.process" from job "postgres_exporter_agent_id_2ae3cac5-ddf0-4e3a-b8e5-a2b1b6406a23_hr-5s" with labels {agent_id="2ae3cac5-ddf0-4e3a-b8e5-a2b1b6406a23",agent_type="postgres_exporter",instance=="/agent_id/2ae3cac5-ddf0-4e3a-b8e5-a2b1b6406a23",job="postgres_exporter_agent_id_2ae3cac5-ddf0-4e3a-b8e5-a2b1b6406a23_hr-5s",machine_id="/machine_id/f7808b4544aa4de49e1af28f0fac6570",node_id="/node_id/5f52fec3-5757-454c-a96a-3577369297a8",node_name="fuse",node_type="generic",service_id="/service_id/10fe3b80-38d3-4b57-b11b-6266c0c8f133",service_name="fuse-postgresql",service_type="postgresql"}: cannot read data: cannot scrape "http://127.0.0.1:42003/metrics?collect%5B%5D=custom_query&collect%5B%5D=exporter&collect%5B%5D=standard
1 Like
Hi Fahad,
Do you use any custom query file for postgresql service?
1 Like
Do you use any custom query file for postgresql service?
yes, I have the this and this as per the instructions at:
and
1 Like
It looks like exporter can’t process custom queries during the chosen metrics resolution period.
Could you move queries files into low-resolution folder and restart pmm-agent?
1 Like
Could you move queries files into low-resolution folder and restart pmm-agent?
already https://raw.githubusercontent.com/Percona-Lab/pmm-custom-queries/master/postgresql/pg_tuple_statistics.yaml and
https://raw.githubusercontent.com/Percona-Lab/pmm-custom-queries/master/postgresql/pg_table_size-details.yaml are placed in the /usr/local/percona/pmm2/collectors/custom-queries/postgresql/low-resolution
as per the instructions. I am not using any other locations…
cd /usr/local/percona/pmm2/collectors/custom-queries/postgresql/low-resolution
➜ low-resolution ll
total 12K
-rw-r--r-- 1 pmm-agent pmm-agent 472 Feb 3 14:17 example-queries-postgres.yml
-rw-r--r-- 1 pmm-agent pmm-agent 1.4K Feb 19 22:05 pg_table_size-details.yaml
-rw-r--r-- 1 pmm-agent pmm-agent 3.7K Feb 19 22:34 pg_tuple_statistics.yaml
1 Like