Pmm-agent throwing errors

I’ve installed the 2.14 appliance and added 2 mongodb nodes (primary / secondary) using pmm2-client.
These nodes were previously monitored with pmm 1.17.4.

When starting the pmm-agent these errors start showing up after a few minutes of uptime.
Dashboards / QAN etc all continue to work.

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.011+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="error while checking mongodb connection: connection(127.0.0.1:27017[-5]) failed to write: context canceled. mongo_up is set to 0"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.073+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="Cannot get node type to check if this is a mongos: connection(127.0.0.1:27017[-6]) failed to write: context canceled"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.074+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="cannot get replSetGetStatus: connection(127.0.0.1:27017[-6]) failed to write: context canceled"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.077+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="error while checking mongodb connection: connection(127.0.0.1:27017[-6]) failed to write: context canceled. mongo_up is set to 0"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.077+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="cannot run getDiagnosticData: connection(127.0.0.1:27017[-6]) failed to write: context canceled"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Whenever that happens I need to restart pmm-agent several times and eventually it will no longer throw those errors.

It’s obvious from the logs that the agent is trying to establish a connection at 127.0.0.1:27017 but it is timing out. Perhaps you need to specify the correct IP:PORT when configuring the agent?

1 Like

I was under the impression running

pmm-admin add mongodb --uri mongodb://user:pass@127.0.0.1:27017

was sufficient to configure it?

I can’t understand how all the mongodb metrics / QAN stats etc are all available in the dashboard when the agent seemingly can’t connect.

1 Like

Can you connect to 127.0.0.1 port 27017

For example what “telnet 127.0.0.1 27017” while on this box tells ?

1 Like

Yes, mongo is listening on 27017.

Netcat does a clean TCP connect:

nc -v 127.0.0.1 27017
Ncat: Version 7.50 ( Ncat - Netcat for the 21st Century )
Ncat: Connected to 127.0.0.1:27017.

This also works:

./mongo -u mongo_user --authenticationDatabase=admin --host 127.0.0.1
Percona Server for MongoDB shell version v4.0.20-14
Enter password:

1 Like

This must be some sort of bug in the mongo exporter. Context’s in golang deal with multiple threads, timeouts, etc. It’s possible that thread #2 got the connection and caused thread #1 to cancel/abort, which would explain why you do have metrics.

Would you mind filling out a bug report at https://jira.percona.com/ ?

1 Like