Issue with system.profile Iterator and maxPoolSize in PMM-Agent

Description:

We’ve been encountering a recurring issue as seen in the logs of the PMM-Agent running on both our primary MongoDB instance and the hidden instance responsible for backups. The error message is as follows:

couldn't create system.profile iterator, reason: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 100

From the error, it’s evident that all the connections in the pool (totaling the maxPoolSize of 100) are being consumed by other operations, leading to timeouts when attempting to create a system.profile iterator.

I’d like to understand the root cause of this problem and any potential solutions. Specifically:

  • Is there a way to increase the maxPoolSize? If so, would this be a recommended approach given our setup?
  • Are there best practices or configurations to manage the connection pool more effectively, especially when considering both primary and backup instances?
  • Could there be specific operations or queries that might be monopolizing the connections, and how might we identify and optimize them?

Any insights and guidance on this issue would be greatly appreciated.

Version:

MongoDB: 5.0.20-17
PMMVersion: 2.39.0

Logs:

$ sudo journalctl -u pmm-agent.service
Sep 27 08:06:47 mongo-015 pmm-agent[13154]: ERRO[2023-09-27T08:06:47.834+00:00] couldn't create system.profile iterator, reason: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 100  agentID=/agent_id/6e228d1c-8a57-43b3-89e8-34e11606bc59 component=agent-builtin db=xxxx8 type=qan_mongodb_profiler_agent
Sep 27 08:06:47 mongo-015 pmm-agent[13154]: ERRO[2023-09-27T08:06:47.834+00:00] couldn't create system.profile iterator, reason: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 100  agentID=/agent_id/6e228d1c-8a57-43b3-89e8-34e11606bc59 component=agent-builtin db=xxxx7 type=qan_mongodb_profiler_agent
Sep 27 08:06:47 mongo-015 pmm-agent[13154]: ERRO[2023-09-27T08:06:47.834+00:00] couldn't create system.profile iterator, reason: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 100  agentID=/agent_id/6e228d1c-8a57-43b3-89e8-34e11606bc59 component=agent-builtin db=xxxx6 type=qan_mongodb_profiler_agent
Sep 27 08:06:47 mongo-015 pmm-agent[13154]: ERRO[2023-09-27T08:06:47.834+00:00] couldn't create system.profile iterator, reason: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 100  agentID=/agent_id/6e228d1c-8a57-43b3-89e8-34e11606bc59 component=agent-builtin db=xxxx5 type=qan_mongodb_profiler_agent
Sep 27 08:06:47 mongo-015 pmm-agent[13154]: ERRO[2023-09-27T08:06:47.834+00:00] couldn't create system.profile iterator, reason: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 100  agentID=/agent_id/6e228d1c-8a57-43b3-89e8-34e11606bc59 component=agent-builtin db=xxxx4 type=qan_mongodb_profiler_agent
Sep 27 08:06:47 mongo-015 pmm-agent[13154]: ERRO[2023-09-27T08:06:47.834+00:00] couldn't create system.profile iterator, reason: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 100  agentID=/agent_id/6e228d1c-8a57-43b3-89e8-34e11606bc59 component=agent-builtin db=xxxx3 type=qan_mongodb_profiler_agent
Sep 27 08:06:47 mongo-015 pmm-agent[13154]: ERRO[2023-09-27T08:06:47.834+00:00] couldn't create system.profile iterator, reason: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 100  agentID=/agent_id/6e228d1c-8a57-43b3-89e8-34e11606bc59 component=agent-builtin db=xxxx2 type=qan_mongodb_profiler_agent
Sep 27 08:06:47 mongo-015 pmm-agent[13154]: ERRO[2023-09-27T08:06:47.834+00:00] couldn't create system.profile iterator, reason: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 100  agentID=/agent_id/6e228d1c-8a57-43b3-89e8-34e11606bc59 component=agent-builtin db=xxxx1 type=qan_mongodb_profiler_agent

Additional Information:

$ pmm-admin status
Agent ID : /agent_id/0b6c7cc4-4e00-49a6-9bdd-7fb7ca8b923b
Node ID  : /node_id/9a1cbc7a-076f-4d11-8fc1-4c59cd8ff38f
Node name: mongo-015

PMM Server:
        URL    : https://192.168.0.98:443/
        Version: 2.39.0

PMM Client:
        Connected        : true
        Time drift       : 3.403513ms
        Latency          : 5.390945ms
        Connection uptime: 71.77
        pmm-admin version: 2.39.0
        pmm-agent version: 2.39.0
Agents:
        /agent_id/4a9eb550-e18c-4372-be13-15da19c98684 mongodb_exporter Running 42002
        /agent_id/6e228d1c-8a57-43b3-89e8-34e11606bc59 mongodb_profiler_agent Running 0
        /agent_id/e2b5688e-5604-44b6-9590-50857a85da1e node_exporter Running 42000
        /agent_id/fe00c190-7608-4b62-b2c8-d55ffde01043 vmagent Running 42001
  • 600+ databases
  • 5 members replicaset - 3 voting members

@Florian, this looks like a bug, so can you submit it to our Bug tracking system jira.percona.com, so the Support and Engineering team can get a closer look?

As an option to investigate - try to use --query-source=none flag when adding MongoDB to turn off QAN and its agent and see if this will change the behavior.

@Roma_Novikov thanks!
The --query-source=none flag is working and useful for the backup server, but I can’t afford to remove QAN on the primary server, as it would negate the whole purpose of the tool.

Hi @Florian,
can we learn how many collections do you have?

Hi @nurlan ,
Currently, we have 32,500 collections.
It’s 55 collections per database.

Hi @Florian,
That’s the reason, our current implementation of mongodb profiler creates connection per DB.

Hi @nurlan ,
Is there a way to increase the maxPoolSize? Or limit the profiled databases ?
(Or any settings thats allows me to use QAN =) )

@Florian , unfortunately no.

@nurlan Thanks for you reply. Do you know if this is planned (6mo - 1yr) ?

@Florian, No at the moment, it’s the first report. Please create a task in our Jira and we will plan it.

Thanks. It’s already done : [PMM-12548] Issue with system.profile Iterator and maxPoolSize in PMM-Agent - Percona JIRA

@Florian Thanks for the report and all the details.
We are now investigating the options to “stop bleeding” with your problem, but in the long run, we probably need to work on [PMM-12327] Low impact on DB performance with QAN for MongoDB - Percona JIRA.