PMM mongodb exporter crashing

Hi,

I created a new MongoDB cluster using Percona MongoDB Operator with PMM agent enabled. After the cluster is initialized, the PMM agent/sidecar on the replica set 0 keeps crashing with the following error.

PS: Only one replica set pod can run the MongoDB exporter. The PMM server can receive metrics from one of the PMM agents while others are in a crashing loop.

Percona Mongodb Operator: 1.11.0
PMM agent in cr: percona/pmm-client:2.24.0
Mongodb version: percona/percona-server-mongodb:4.4.6-8
Sharded cluster: YES
PMM server version: percona/pmm-server:2.24.0

my-cluster-name-rs0-2 pmm-client INFO[2022-02-04T09:38:12.436+00:00] 2022-02-04T09:38:12.436Z   error   VictoriaMetrics/lib/promscrape/scrapework.go:258        error when scraping "http://127.0.0.1:30101/metrics" from job "mongodb_exporter_agent_id_xxxxxxxxx" with labels {agent_id="/agent_id/1d630f4e-c453-4dcf-81ac-16f49a61b6fb",agent_type="mongodb_exporter",cluster="my-cluster-name",instance="/agent_id/1d630f4e-c453-4dcf-81ac-16f49a61b6fb",job="mongodb_exporter_agent_id_1d630f4e-c453-4dcf-81ac-16f49a61b6fb_hr-1s",node_id="/node_id/cb57d79a-83e7-4b5a-9b75-2ff662d72534",node_name="psmdb-my-cluster-name-rs0-2",node_type="container",service_id="/service_id/2c1e9535-a274-4dc1-8ffe-774bf018fdd6",service_name="psmdb-my-cluster-name-rs0-2",service_type="mongodb"}: error when scraping "http://127.0.0.1:30101/metrics": dial tcp4 127.0.0.1:30101: connect: connection refused; try -enableTCP6 command-line flag if you scrape ipv6 addresses  agentID=/agent_id/1aa7e263-a7af-489d-9152-5b4d36655ecd component=agent-process type=vm_agent

Hello @patricktuapui ,

I will need to reproduce this. Could you please share your operator yaml?
Is container crashing right away or after some time?

1 Like

Hi @spronin ,
The pmm client container crashed after a few minutes.

apiVersion: psmdb.percona.com/v1-9-0
kind: PerconaServerMongoDB
metadata:
  namespace: psmdb
  name: my-cluster-name
spec:
  crVersion: 1.11.0
  image: percona/percona-server-mongodb:4.4.6-8
  imagePullPolicy: Always
  allowUnsafeConfigurations: false
  #updateStrategy: SmartUpdate
  updateStrategy: RollingUpdate 
  upgradeOptions:
    versionServiceEndpoint: https://check.percona.com
    apply: 4.4-recommended
    schedule: "0 2 * * *"
    setFCV: false
  secrets:
    users: my-cluster-name-secrets
  pmm:
    enabled: true
    image: percona/pmm-client:2.24.0
    serverHost: pmm.myserver
  replsets:

  - name: rs0
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      exposeType: LoadBalancer
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      persistentVolumeClaim:
        storageClassName: standard
        resources:
          requests:
            storage: 5Gi

  sharding:
    enabled: true

    configsvrReplSet:
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi

    mongos:
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
      expose:
        exposeType: LoadBalancer

  mongod:
    net:
      port: 27017
      hostPort: 0
    security:
      redactClientLogData: false
      enableEncryption: true
      encryptionKeySecret: my-cluster-name-mongodb-encryption-key
      encryptionCipherMode: AES256-CBC
    setParameter:
      ttlMonitorSleepSecs: 60
      wiredTigerConcurrentReadTransactions: 128
      wiredTigerConcurrentWriteTransactions: 128
    storage:
      engine: wiredTiger
      inMemory:
        engineConfig:
          inMemorySizeRatio: 0.9
      wiredTiger:
        engineConfig:
          cacheSizeRatio: 0.5
          directoryForIndexes: false
          journalCompressor: snappy
        collectionConfig:
          blockCompressor: snappy
        indexConfig:
          prefixCompression: true
    operationProfiling:
      mode: slowOp
      slowOpThresholdMs: 100
      rateLimit: 100

  backup:
    enabled: false
    restartOnFailure: true
    image: percona/percona-server-mongodb-operator:1.9.0-backup
    serviceAccountName: percona-server-mongodb-operator
    #serviceAccountName: mongodb-operator-psmdb-operator
    storages:

    pitr:
      enabled: false
    tasks:
1 Like

Hello,

I have experienced similar problem. If ppm.enabled=true while the cluster is being created, but the monitoring server is not started, the pods are crushing. But If the monitoring server is already started while the cluster is creating then the cluster is initialized as expected.

It seems for me that the cluster cannot start if the monitoring server is not up.

1 Like

@patricktuapui @Darko ,

sorry for not coming back earlier, but yes, if PMM server is down or not reachable, pmm-client container is crashing, thus DB pod is not starting.
We understand that it is absurd and we already have a fix - [K8SPSMDB-537] pmm container should not crash in case of issues - Percona JIRA

It is in the main branch, but not released yet. We plan to release 1.12 this month.

1 Like