Our mongo clusters failed to initialize and the operator if failing to authenticate. the psmdb is stuck in an initialization state.
Kubernetes 1.22
Percona Operator 1.13
Not sure what caused the problem but we did switch the users to an external secret and removed the user data from the psmdb resource (or the helm chart overrides)
psmdb resource:
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"psmdb.percona.com/v1","kind":"PerconaServerMongoDB"}
name: mongodb-psmdb-db
labels:
app.kubernetes.io/name: psmdb-db
helm.sh/chart: psmdb-db-1.13.0
app.kubernetes.io/instance: mongodb
app.kubernetes.io/version: "1.13.0"
app.kubernetes.io/managed-by: Helm
finalizers:
- delete-psmdb-pods-in-order
- delete-psmdb-pvc
spec:
crVersion: 1.13.0
pause: false
unmanaged: false
image: "percona/percona-server-mongodb:5.0.11-10"
imagePullPolicy: "IfNotPresent"
multiCluster:
enabled: false
secrets:
users: mongodb-psmdb-db-secrets
encryptionKey: mongodb-psmdb-db-mongodb-encryption-key
updateStrategy: SmartUpdate
upgradeOptions:
versionServiceEndpoint: https://check.percona.com
apply: disabled
schedule: 0 2 * * *
setFCV: false
pmm:
enabled: false
image: "percona/pmm-client:2.30.0"
serverHost: monitoring-service
replsets:
- name: rs0
size: 3
configuration: |
systemLog:
verbosity: 0
auditLog:
destination: console
filter: '{ "param.ns" : { $ne: "local.replset.oplogTruncateAfterPoint" }}'
affinity:
antiAffinityTopologyKey:
nodeSelector:
ffServerType: utility
ffUse: databases
ffUseType: worker
podDisruptionBudget:
maxUnavailable: 1
expose:
enabled: false
exposeType: ClusterIP
resources:
limits:
memory: 5G
requests:
memory: 2G
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 5Gi
sharding:
enabled: false
configsvrReplSet:
size: 3
affinity:
antiAffinityTopologyKey: kubernetes.io/hostname
podDisruptionBudget:
maxUnavailable: 1
expose:
enabled: false
exposeType: ClusterIP
resources:
limits:
cpu: 300m
memory: 0.5G
requests:
cpu: 300m
memory: 0.5G
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 3Gi
mongos:
size: 2
affinity:
antiAffinityTopologyKey: kubernetes.io/hostname
podDisruptionBudget:
maxUnavailable: 1
resources:
limits:
cpu: 300m
memory: 0.5G
requests:
cpu: 300m
memory: 0.5G
expose:
exposeType: ClusterIP
backup:
enabled: false
image: "percona/percona-backup-mongodb:1.7.0"
serviceAccountName: percona-server-mongodb-operator
storages:
s3-us-east:
s3:
bucket: ff-mongo-backup-recovery
insecureSkipTLSVerify: false
maxUploadParts: 10000
prefix: dev
region: us-east-1
storageClass: STANDARD
uploadPartSize: 10485760
type: s3
pitr:
enabled: false
tasks:
- compressionType: gzip
enabled: true
keep: 3
name: daily-s3-us-east
schedule: 0 0 * * *
storageName: s3-us-east
- compressionType: gzip
enabled: false
keep: 5
name: weekly-s3-us-east
schedule: 0 0 * * 0
storageName: s3-us-east
PSMDB:
mongodb-psmdb-db mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local initializing 16d
PSMDB Status:
- lastTransitionTime: "2023-01-05T17:48:56Z"
message: 'create pbm object: create PBM connection to mongodb-psmdb-db-rs0-0.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017,mongodb-psmdb-db-rs0-1.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017,mongodb-psmdb-db-rs0-2.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017:
create mongo connection: mongo ping: connection() error occured during connection
handshake: auth error: unable to authenticate using mechanism "SCRAM-SHA-256":
(AuthenticationFailed) Authentication failed.'
reason: ErrorReconcile
status: "True"
type: error
- lastTransitionTime: "2023-01-05T17:48:56Z"
status: "True"
type: ready
- lastTransitionTime: "2023-01-05T17:48:56Z"
status: "True"
type: initializing
host: mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local
mongoImage: percona/percona-server-mongodb:5.0.11-10
mongoVersion: 5.0.11-10
observedGeneration: 7
ready: 3
replsets:
rs0:
initialized: true
ready: 3
size: 3
status: ready
size: 3
state: initializing
kind: List
metadata:
resourceVersion: ""
Any idea how to recover the cluster? the pods are running fine but I can’t upgrade.