When I turn on the backup option, operator reports an error:error occured during connection handshake: x509

my cr.yaml:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: my-cluster-name
  finalizers:
    - delete-psmdb-pods-in-order
#    - delete-psmdb-pvc
spec:
  crVersion: 1.13.0
  image: percona/percona-server-mongodb:5.0.11-10
  imagePullPolicy: Always
  allowUnsafeConfigurations: false
  ....

  backup:
  enabled: true
  image: percona/percona-backup-mongodb:1.8.1
  serviceAccountName: percona-server-mongodb-operator
  ....

when ‘backup’ is true, errors occured:
Reconciler error {“name”: “my-cluster-elvis2”, “namespace”: “default”, “error”: “create pbm object: create PBM connection to my-cluster-elvis2-rs0-0.my-cluster-elvis2-rs0.default.svc.cluster.local:27017,my-cluster-elvis2-rs0-1.my-cluster-elvis2-rs0.default.svc.cluster.local:27017,my-cluster-elvis2-rs0-2.my-cluster-elvis2-rs0.default.svc.cluster.local:27017: create mongo connection: mongo ping: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: 10.181.4.179:30862, Type: Unknown, Last error: connection() error occured during connection handshake: x509: cannot validate certificate for 10.181.4.179 because it doesn’t contain any IP SANs }, { Addr: 10.181.4.29:31197, Type: Unknown, Last error: connection() error occured during connection handshake: x509: cannot validate certificate for 10.181.4.29 because it doesn’t contain any IP SANs }, { Addr: 10.181.4.214:30971, Type: Unknown, Last error: connection() error occured during connection handshake: x509: cannot validate certificate for 10.181.4.214 because it doesn’t contain any IP SANs }, ] }”, “errorVerbose”: "server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: 10.181.4.179:30862, Type: Unknown, Last error: connection() error occured during connection handshake: x509: cannot validate certificate for 10.181.4.179 because it doesn’t contain any IP SANs }, { Addr: 10.181.4.29:31197, Type: Unknown, Last error: connection() error occured during connection handshake: x509: cannot validate certificate for 10.181.4.29 because it doesn’t contain any IP SANs }, { Addr: 10.181.4.214:30971, Type: Unknown, Last error: connection() error occured during connection handshake: x509: cannot validate certificate for 10.181.4.214 because it doesn’t contain any IP SANs }

2 Likes

Hey @Worlder_Mo ,

how can I reproduce it?

I deployed operator version 1.13 and default YAML manifest (cr.yaml), but disabled backups.
Enabled it then. The Pods were restarted and all is up now.

I don’t see any issues, nor I see this error in the logs.

1 Like

I didn’t do anything, I just enabled the backup option when I deployed cr.yaml, and the errors occused.
This should be an error with TLS certificates

1 Like

@Worlder_Mo can you share the cr?

1 Like

I used cr.yaml file of the official sample document.

1 Like

@Sergey_Pronin But I enabled the expose of replset.
This is the cr I used:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: my-cluster-name
  finalizers:
    - delete-psmdb-pods-in-order
    - delete-psmdb-pvc
spec:
#  platform: openshift
#  clusterServiceDNSSuffix: svc.cluster.local
#  clusterServiceDNSMode: "Internal"
#  pause: true
#  unmanaged: false
  crVersion: 1.13.0
  image: percona/percona-server-mongodb:5.0.11-10
  imagePullPolicy: IfNotPresent
#  tls:
#    # 90 days in hours
#    certValidityDuration: 2160h
#  imagePullSecrets:
#    - name: private-registry-credentials
  allowUnsafeConfigurations: false
  updateStrategy: SmartUpdate
#  multiCluster:
#    enabled: true
#    DNSSuffix: svc.clusterset.local
  upgradeOptions:
    versionServiceEndpoint: https://check.percona.com
    apply: disabled
    schedule: "0 2 * * *"
    setFCV: false
  secrets:
    users: my-cluster-name-secrets
    encryptionKey: my-cluster-name-mongodb-encryption-key
#    vault: my-cluster-name-vault
  pmm:
    enabled: false
    image: percona/pmm-client:2.30.0
    serverHost: monitoring-service
#    mongodParams: --environment=ENVIRONMENT
#    mongosParams: --environment=ENVIRONMENT
  replsets:

  - name: rs0
    size: 3
#    externalNodes:
#    - host: 34.124.76.90
#    - host: 34.124.76.91
#      port: 27017
#      votes: 0
#      priority: 0
#    - host: 34.124.76.92
#    # for more configuration fields refer to https://docs.mongodb.com/manual/reference/configuration-options/
#    configuration: |
#      net:
#        tls:
#          mode: preferTLS
#      operationProfiling:
#        mode: slowOp
#      systemLog:
#        verbosity: 1
#      storage:
#        engine: wiredTiger
#        wiredTiger:
#          engineConfig:
#            directoryForIndexes: false
#            journalCompressor: snappy
#          collectionConfig:
#            blockCompressor: snappy
#          indexConfig:
#            prefixCompression: true
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
#      advanced:
#        nodeAffinity:
#          requiredDuringSchedulingIgnoredDuringExecution:
#            nodeSelectorTerms:
#            - matchExpressions:
#              - key: kubernetes.io/e2e-az-name
#                operator: In
#                values:
#                - e2e-az1
#                - e2e-az2
#    tolerations:
#    - key: "node.alpha.kubernetes.io/unreachable"
#      operator: "Exists"
#      effect: "NoExecute"
#      tolerationSeconds: 6000
#    priorityClassName: high-priority
#    annotations:
#      iam.amazonaws.com/role: role-arn
#    labels:
#      rack: rack-22
#    nodeSelector:
#      disktype: ssd
#    storage:
#      engine: wiredTiger
#      wiredTiger:
#        engineConfig:
#          cacheSizeRatio: 0.5
#          directoryForIndexes: false
#          journalCompressor: snappy
#        collectionConfig:
#          blockCompressor: snappy
#        indexConfig:
#          prefixCompression: true
#      inMemory:
#        engineConfig:
#           inMemorySizeRatio: 0.5
#    livenessProbe:
#      failureThreshold: 4
#      initialDelaySeconds: 60
#      periodSeconds: 30
#      timeoutSeconds: 10
#      startupDelaySeconds: 7200
#    readinessProbe:
#      failureThreshold: 8
#      initialDelaySeconds: 10
#      periodSeconds: 3
#      successThreshold: 1
#      timeoutSeconds: 2
#    runtimeClassName: image-rc
#    sidecars:
#    - image: busybox
#      command: ["/bin/sh"]
#      args: ["-c", "while true; do echo echo $(date -u) 'test' >> /dev/null; sleep 5;done"]
#      name: rs-sidecar-1
#      volumeMounts:
#        - mountPath: /volume1
#          name: sidecar-volume-claim
#        - mountPath: /secret
#          name: sidecar-secret
#        - mountPath: /configmap
#          name: sidecar-config
#    sidecarVolumes:
#    - name: sidecar-secret
#      secret:
#        secretName: mysecret
#    - name: sidecar-config
#      configMap:
#        name: myconfigmap
#    sidecarPVCs:
#    - apiVersion: v1
#      kind: PersistentVolumeClaim
#      metadata:
#        name: sidecar-volume-claim
#      spec:
#        resources:
#          requests:
#            storage: 1Gi
#        volumeMode: Filesystem
#        accessModes:
#          - ReadWriteOnce
    podDisruptionBudget:
      maxUnavailable: 1
#      minAvailable: 0
    expose:
      enabled: true
      exposeType: NodePort
#      loadBalancerSourceRanges:
#        - 10.0.0.0/8
#      serviceAnnotations:
#        service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
#      serviceLabels:
#        rack: rack-22
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
#      emptyDir: {}
#      hostPath:
#        path: /data
#        type: Directory
      persistentVolumeClaim:
#        storageClassName: standard
#        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 3Gi

    nonvoting:
      enabled: false
#      podSecurityContext: {}
#      containerSecurityContext: {}
      size: 3
#      # for more configuration fields refer to https://docs.mongodb.com/manual/reference/configuration-options/
#      configuration: |
#        operationProfiling:
#          mode: slowOp
#        systemLog:
#          verbosity: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
#        advanced:
#          nodeAffinity:
#            requiredDuringSchedulingIgnoredDuringExecution:
#              nodeSelectorTerms:
#              - matchExpressions:
#                - key: kubernetes.io/e2e-az-name
#                  operator: In
#                  values:
#                  - e2e-az1
#                  - e2e-az2
#      tolerations:
#      - key: "node.alpha.kubernetes.io/unreachable"
#        operator: "Exists"
#        effect: "NoExecute"
#        tolerationSeconds: 6000
#      priorityClassName: high-priority
#      annotations:
#        iam.amazonaws.com/role: role-arn
#      labels:
#        rack: rack-22
#      nodeSelector:
#        disktype: ssd
      podDisruptionBudget:
        maxUnavailable: 1
#        minAvailable: 0
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
#        emptyDir: {}
#        hostPath:
#          path: /data
#          type: Directory
        persistentVolumeClaim:
#          storageClassName: standard
#          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 3Gi
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
#        advanced:
#          nodeAffinity:
#            requiredDuringSchedulingIgnoredDuringExecution:
#              nodeSelectorTerms:
#              - matchExpressions:
#                - key: kubernetes.io/e2e-az-name
#                  operator: In
#                  values:
#                  - e2e-az1
#                  - e2e-az2
#      tolerations:
#      - key: "node.alpha.kubernetes.io/unreachable"
#        operator: "Exists"
#        effect: "NoExecute"
#        tolerationSeconds: 6000
#      priorityClassName: high-priority
#      annotations:
#        iam.amazonaws.com/role: role-arn
#      labels:
#        rack: rack-22
#      nodeSelector:
#        disktype: ssd

  sharding:
    enabled: false

    configsvrReplSet:
      size: 3
#      externalNodes:
#      - host: 34.124.76.93
#      - host: 34.124.76.94
#        port: 27017
#        votes: 0
#        priority: 0
#      - host: 34.124.76.95
#      # for more configuration fields refer to https://docs.mongodb.com/manual/reference/configuration-options/
#      configuration: |
#        operationProfiling:
#          mode: slowOp
#        systemLog:
#           verbosity: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
#        advanced:
#          nodeAffinity:
#            requiredDuringSchedulingIgnoredDuringExecution:
#              nodeSelectorTerms:
#              - matchExpressions:
#                - key: kubernetes.io/e2e-az-name
#                  operator: In
#                  values:
#                  - e2e-az1
#                  - e2e-az2
#      tolerations:
#      - key: "node.alpha.kubernetes.io/unreachable"
#        operator: "Exists"
#        effect: "NoExecute"
#        tolerationSeconds: 6000
#      priorityClassName: high-priority
#      annotations:
#        iam.amazonaws.com/role: role-arn
#      labels:
#        rack: rack-22
#      nodeSelector:
#        disktype: ssd
#      livenessProbe:
#        failureThreshold: 4
#        initialDelaySeconds: 60
#        periodSeconds: 30
#        timeoutSeconds: 10
#        startupDelaySeconds: 7200
#      readinessProbe:
#        failureThreshold: 3
#        initialDelaySeconds: 10
#        periodSeconds: 3
#        successThreshold: 1
#        timeoutSeconds: 2
#      runtimeClassName: image-rc
#      sidecars:
#      - image: busybox
#        command: ["/bin/sh"]
#        args: ["-c", "while true; do echo echo $(date -u) 'test' >> /dev/null; sleep 5;done"]
#        name: rs-sidecar-1
      podDisruptionBudget:
        maxUnavailable: 1
      expose:
        enabled: false
        exposeType: ClusterIP
#        loadBalancerSourceRanges:
#          - 10.0.0.0/8
#        serviceAnnotations:
#          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
#        serviceLabels:
#          rack: rack-22
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
#       emptyDir: {}
#       hostPath:
#         path: /data
#         type: Directory
        persistentVolumeClaim:
#          storageClassName: standard
#          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 3Gi

    mongos:
      size: 3
#      # for more configuration fields refer to https://docs.mongodb.com/manual/reference/configuration-options/
#      configuration: |
#        systemLog:
#           verbosity: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
#        advanced:
#          nodeAffinity:
#            requiredDuringSchedulingIgnoredDuringExecution:
#              nodeSelectorTerms:
#              - matchExpressions:
#                - key: kubernetes.io/e2e-az-name
#                  operator: In
#                  values:
#                  - e2e-az1
#                  - e2e-az2
#      tolerations:
#      - key: "node.alpha.kubernetes.io/unreachable"
#        operator: "Exists"
#        effect: "NoExecute"
#        tolerationSeconds: 6000
#      priorityClassName: high-priority
#      annotations:
#        iam.amazonaws.com/role: role-arn
#      labels:
#        rack: rack-22
#      nodeSelector:
#        disktype: ssd
#      livenessProbe:
#        failureThreshold: 4
#        initialDelaySeconds: 60
#        periodSeconds: 30
#        timeoutSeconds: 10
#        startupDelaySeconds: 7200
#      readinessProbe:
#        failureThreshold: 3
#        initialDelaySeconds: 10
#        periodSeconds: 3
#        successThreshold: 1
#        timeoutSeconds: 2
#      runtimeClassName: image-rc
#      sidecars:
#      - image: busybox
#        command: ["/bin/sh"]
#        args: ["-c", "while true; do echo echo $(date -u) 'test' >> /dev/null; sleep 5;done"]
#        name: rs-sidecar-1
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      expose:
        exposeType: ClusterIP
#        servicePerPod: true
#        loadBalancerSourceRanges:
#          - 10.0.0.0/8
#        serviceAnnotations:
#          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
#        serviceLabels:
#          rack: rack-22

#  mongod:
#    security:
#      encryptionKeySecret: "my-cluster-name-mongodb-encryption-key"

  backup:
    enabled: true
    image: percona/percona-backup-mongodb:1.8.1
    serviceAccountName: percona-server-mongodb-operator
#    annotations:
#      iam.amazonaws.com/role: role-arn
#    resources:
#      limits:
#        cpu: "300m"
#        memory: "0.5G"
#      requests:
#        cpu: "300m"
#        memory: "0.5G"
#    storages:
#      s3-us-west:
#        type: s3
#        s3:
#          bucket: S3-BACKUP-BUCKET-NAME-HERE
#          credentialsSecret: my-cluster-name-backup-s3
#          region: us-west-2
#          prefix: ""
#          uploadPartSize: 10485760
#          maxUploadParts: 10000
#          storageClass: STANDARD
#          insecureSkipTLSVerify: false
#      minio:
#        type: s3
#        s3:
#          bucket: MINIO-BACKUP-BUCKET-NAME-HERE
#          region: us-east-1
#          credentialsSecret: my-cluster-name-backup-minio
#          endpointUrl: http://minio.psmdb.svc.cluster.local:9000/minio/
#          insecureSkipTLSVerify: false
#          prefix: ""
#      azure-blob:
#        type: azure
#        azure:
#          container: CONTAINER-NAME
#          prefix: PREFIX-NAME
#          credentialsSecret: SECRET-NAME
    pitr:
      enabled: false
#      oplogSpanMin: 10
      compressionType: gzip
      compressionLevel: 6
#    tasks:
#      - name: daily-s3-us-west
#        enabled: true
#        schedule: "0 0 * * *"
#        keep: 3
#        storageName: s3-us-west
#        compressionType: gzip
#        compressionLevel: 6
#      - name: weekly-s3-us-west
#        enabled: false
#        schedule: "0 0 * * 0"
#        keep: 5
#        storageName: s3-us-west
#        compressionType: gzip
#        compressionLevel: 6
1 Like

I generated certificates manually to solve it. I put the cluster IPs into certificates:

cat <<EOF | cfssl gencert -ca=ca.pem  -ca-key=ca-key.pem -config=./ca-config.json - | cfssljson -bare server
  {
    "hosts": [
      "localhost",
      "10.181.4.14",
      "10.181.5.143",
      "10.181.5.148",
      "10.181.5.149",
      "10.181.4.145",
      "10.181.4.174",
      "10.181.4.181",
      "${CLUSTER_NAME}-rs0",
      "${CLUSTER_NAME}-rs0.${NAMESPACE}",
      "${CLUSTER_NAME}-rs0.${NAMESPACE}.svc.cluster.local",
      "*.${CLUSTER_NAME}-rs0",
      "*.${CLUSTER_NAME}-rs0.${NAMESPACE}",
      "*.${CLUSTER_NAME}-rs0.${NAMESPACE}.svc.cluster.local"
    ],
    "names": [
      {
        "O": "PSMDB"
      }
    ],
    "CN": "${CLUSTER_NAME/-rs0}",
    "key": {
      "algo": "rsa",
      "size": 2048
    }
  }
EOF```

If I enable the replset 'expose' option and set the 'NodePort 'type, I must  generated certificates manually.
1 Like

@Worlder_Mo this is awsm that you figured it out! Thank you.
I created a bug in our JIRA to track it: [K8SPSMDB-823] ReplicaSet nodeport exposure breaks backups - Percona JIRA

1 Like

I think the issue about TLS validation should be an automated operation.

1 Like

hello @Worlder_Mo .
I am also experiencing the same error. May I ask how you created the certficate and put it in?

1 Like

This is mentioned in the TLS section of the official deployment documentation