Pmm-agent throwing errors

I’ve installed the 2.14 appliance and added 2 mongodb nodes (primary / secondary) using pmm2-client.
These nodes were previously monitored with pmm 1.17.4.

When starting the pmm-agent these errors start showing up after a few minutes of uptime.
Dashboards / QAN etc all continue to work.

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.011+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="error while checking mongodb connection: connection(127.0.0.1:27017[-5]) failed to write: context canceled. mongo_up is set to 0"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.073+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="Cannot get node type to check if this is a mongos: connection(127.0.0.1:27017[-6]) failed to write: context canceled"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.074+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="cannot get replSetGetStatus: connection(127.0.0.1:27017[-6]) failed to write: context canceled"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.077+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="error while checking mongodb connection: connection(127.0.0.1:27017[-6]) failed to write: context canceled. mongo_up is set to 0"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Feb 03 16:14:30 vlst-mongodb-replica01 pmm-agent[30149]: INFO[2021-02-03T16:14:30.077+01:00]logrus/entry.go:359 logrus.(*Entry).Logln time="2021-02-03T16:14:30+01:00" level=error msg="cannot run getDiagnosticData: connection(127.0.0.1:27017[-6]) failed to write: context canceled"  agentID=/agent_id/7ced8a41-fd19-4ef1-b7a5-73acc0c13a10 component=agent-process type=mongodb_exporter

Whenever that happens I need to restart pmm-agent several times and eventually it will no longer throw those errors.

It’s obvious from the logs that the agent is trying to establish a connection at 127.0.0.1:27017 but it is timing out. Perhaps you need to specify the correct IP:PORT when configuring the agent?

I was under the impression running

pmm-admin add mongodb --uri mongodb://user:pass@127.0.0.1:27017

was sufficient to configure it?

I can’t understand how all the mongodb metrics / QAN stats etc are all available in the dashboard when the agent seemingly can’t connect.

Can you connect to 127.0.0.1 port 27017

For example what “telnet 127.0.0.1 27017” while on this box tells ?

Yes, mongo is listening on 27017.

Netcat does a clean TCP connect:

nc -v 127.0.0.1 27017
Ncat: Version 7.50 ( Ncat - Netcat for the 21st Century )
Ncat: Connected to 127.0.0.1:27017.

This also works:

./mongo -u mongo_user --authenticationDatabase=admin --host 127.0.0.1
Percona Server for MongoDB shell version v4.0.20-14
Enter password:

This must be some sort of bug in the mongo exporter. Context’s in golang deal with multiple threads, timeouts, etc. It’s possible that thread #2 got the connection and caused thread #1 to cancel/abort, which would explain why you do have metrics.

Would you mind filling out a bug report at https://jira.percona.com/ ?