Hi @Sergey_Pronin,
I have 2 different k8 cluster clusters in 2 different regions which are reachable to each other.
I am facing an issue following the document for “Set up Percona Server for MongoDB cross-site replication”.
The replica site pods keeps restarting with below error:
Normal Killing 29m (x2 over 32m) kubelet Container mongod failed liveness probe, will be restarted
Normal Pulled 29m (x3 over 35m) kubelet Container image "xyz.myrepo.com/percona-mongo/percona-server-mongodb:6.0.9-7" already present on machine
Normal Created 29m (x3 over 35m) kubelet Created container mongod
Normal Started 29m (x3 over 35m) kubelet Started container mongod
Warning BackOff 10m (x7 over 11m) kubelet Back-off restarting failed container mongod in pod mongodb-db-psmdb-db-rs0-0_mongo-db(7c0bda6c-da2e-4e74-8011-066746f99434)
Warning Unhealthy 16s (x40 over 33m) kubelet Liveness probe failed: command "/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200" timed out
NOTE: I have made sure that the my-cluster-name-secrets
, my-cluster-name-ssl
, and my-cluster-name-ssl-internal
secrets from main site are available in replica site.
DETAILS
Our Cluster:
I have 2 different K8 clusters and have network connectivity with each other. I want masters and some replicas in one cluster and more replicas only in the other cluster.
Below is what I did,
I download the helm chart for the Percona operator and Percona Mongodb, version 1.15.0
I followed the document, “Set up Percona Server for MongoDB cross-site replication”
I deployed operators in both the clusters using below values.yaml file. (They are successfully up in both clusters)
# Default values for psmdb-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicaCount: 3
image:
repository: xyz.myrepo.com/percona-mongo/percona-server-mongodb-operator
tag: 1.15.0
pullPolicy: IfNotPresent
# disableTelemetry: according to
# https://docs.percona.com/percona-operator-for-mongodb/telemetry.html
# this is how you can disable telemetry collection
# default is false which means telemetry will be collected
disableTelemetry: false
# set if you want to specify a namespace to watch
# defaults to `.Release.namespace` if left blank
# watchNamespace:
# set if operator should be deployed in cluster wide mode. defaults to false
# watchAllNamespaces: false
watchAllNamespaces: true
# rbac: settings for deployer RBAC creation
rbac:
# rbac.create: if false RBAC resources should be in place
create: true
# serviceAccount: settings for Service Accounts used by the deployer
serviceAccount:
# serviceAccount.create: Whether to create the Service Accounts or not
create: true
# podAnnotations: {}
# prometheus.io/scrape: "true"
# prometheus.io/port: "8080"
podAnnotations:
mcs.xxxxxxx/enable: "true"
podSecurityContext: {}
# runAsNonRoot: true
# runAsUser: 2
# runAsGroup: 2
# fsGroup: 2
# fsGroupChangePolicy: "OnRootMismatch"
securityContext: {}
# allowPrivilegeEscalation: false
# capabilities:
# drop:
# - ALL
# seccompProfile:
# type: RuntimeDefault
# set if you want to use a different operator name
# defaults to `percona-server-mongodb-operator`
# operatorName:
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
env:
resyncPeriod: 5s
resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
nodeSelector: {}
tolerations: []
# affinity: {}
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
# logStructured: false
# logLevel: "INFO"
logStructured: true
logLevel: "INFO"
Now on the main site, I deployed the MongoDB using the below values.yaml file and its also up successfully.
finalizers:
- delete-psmdb-pods-in-order
nameOverride: ""
fullnameOverride: ""
crVersion: 1.15.0
pause: false
unmanaged: false
allowUnsafeConfigurations: false
multiCluster:
enabled: false
updateStrategy: SmartUpdate
upgradeOptions:
versionServiceEndpoint: https://check.percona.com
apply: disabled
schedule: "0 2 * * *"
setFCV: false
image:
repository: xyz.myrepo.com/percona-mongo/percona-server-mongodb
tag: 6.0.9-7
imagePullPolicy: IfNotPresent
secrets: {}
pmm:
enabled: false
image:
repository: percona/pmm-client
tag: 2.39.0
serverHost: monitoring-service
replsets:
- name: rs0
labels:
mcs.xxxxxxx/enable: "true"
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
expose:
serviceLabels:
mcs.xxxxxxx/enable: "true"
enabled: true
exposeType: ClusterIP
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
volumeSpec:
pvc:
storageClassName: hdd-jbod-lvm-ext4-0
resources:
requests:
storage: 3Gi
nonvoting:
enabled: false
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
volumeSpec:
pvc:
storageClassName: hdd-jbod-lvm-ext4-0
resources:
requests:
storage: 3Gi
arbiter:
enabled: false
size: 1
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
sharding:
enabled: false
balancer:
enabled: false
configrs:
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
expose:
enabled: false
exposeType: ClusterIP
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
volumeSpec:
pvc:
storageClassName: hdd-jbod-lvm-ext4-0
resources:
requests:
storage: 3Gi
mongos:
size: 2
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
expose:
exposeType: ClusterIP
backup:
enabled: false
image:
repository: xyz.myrepo.com/percona-mongo/percona-backup-mongodb
tag: 2.3.0
serviceAccountName: percona-server-mongodb-operator
storages:
pitr:
enabled: false
oplogOnly: false
tasks:
Then I copied the below secrets and deployed them to the replica site,
I removed the annotations
, creationTimestamp
, resourceVersion
, selfLink
, and uid
metadata fields from the resulting file to make it ready for the Replica .
mongodb-db-psmdb-db-secrets Opaque 10 4d20h
mongodb-db-psmdb-db-ssl kubernetes.io/tls 3 4d19h
mongodb-db-psmdb-db-ssl-internal kubernetes.io/tls 3 4d19h
Now I deployed the replica site mongodb with below changes in values file.
updateStrategy: OnDelete
backup: false
My values.yaml for replica site.
finalizers:
- delete-psmdb-pods-in-order
nameOverride: ""
fullnameOverride: ""
crVersion: 1.15.0
pause: false
unmanaged: true
allowUnsafeConfigurations: false
multiCluster:
enabled: false
updateStrategy: OnDelete
upgradeOptions:
versionServiceEndpoint: https://check.percona.com
apply: disabled
schedule: "0 2 * * *"
setFCV: false
image:
repository: xyz.myrepo.com/percona-mongo/percona-server-mongodb
tag: 6.0.9-7
imagePullPolicy: IfNotPresent
secrets: {}
pmm:
enabled: false
image:
repository: percona/pmm-client
tag: 2.39.0
serverHost: monitoring-service
replsets:
- name: rs0
labels:
mcs.xxxxxxx/enable: "true"
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
expose:
serviceLabels:
mcs.xxxxxxx/enable: "true"
enabled: true
exposeType: ClusterIP
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
volumeSpec:
pvc:
storageClassName: ssd-jbod-lvm-ext4-0
resources:
requests:
storage: 3Gi
nonvoting:
enabled: false
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
volumeSpec:
pvc:
storageClassName: ssd-jbod-lvm-ext4-0
resources:
requests:
storage: 3Gi
arbiter:
enabled: false
size: 1
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
sharding:
enabled: false
balancer:
enabled: false
configrs:
size: 3
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
expose:
enabled: false
exposeType: ClusterIP
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
volumeSpec:
pvc:
storageClassName: ssd-jbod-lvm-ext4-0
resources:
requests:
storage: 3Gi
mongos:
size: 2
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
podDisruptionBudget:
maxUnavailable: 1
resources:
limits:
cpu: "300m"
memory: "0.5G"
requests:
cpu: "300m"
memory: "0.5G"
expose:
exposeType: ClusterIP
backup:
enabled: false
image:
repository: xyz.myrepo.com/percona-mongo/percona-backup-mongodb
tag: 2.3.0
serviceAccountName: percona-server-mongodb-operator
storages:
pitr:
enabled: false
oplogOnly: false
tasks:
On deploying this the replica cluster comes up but the pods keep restarting with the below WARNING.
Normal Killing 29m (x2 over 32m) kubelet Container mongod failed liveness probe, will be restarted
Normal Pulled 29m (x3 over 35m) kubelet Container image "xyz.myrepo.com/percona-mongo/percona-server-mongodb:6.0.9-7" already present on machine
Normal Created 29m (x3 over 35m) kubelet Created container mongod
Normal Started 29m (x3 over 35m) kubelet Started container mongod
Warning BackOff 10m (x7 over 11m) kubelet Back-off restarting failed container mongod in pod mongodb-db-psmdb-db-rs0-0_mongo-db(7c0bda6c-da2e-4e74-8011-066746f99434)
Warning Unhealthy 16s (x40 over 33m) kubelet Liveness probe failed: command "/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200" timed out
I had made sure that all the 3 main secrets were copied but still, I don’t understand why it fails.
Please help me with this. I don’t see any clear document or video that will show the helm installation for the same. Such a document should be really helpful as I am pretty sure that many of them might have faced this issue.
I also assume that post the replica site is up, I just need to add the IPs or its ClusterIP in the main site values file like below and do the helm upgrade.
replsets:
- name: rs0
size: 3
# externalNodes:
- host: 34.124.76.XX
- host: 34.124.76.XX
port: 27017
votes: 0
priority: 0
- host: 34.124.76.XX
Please help me with this. I don’t see any clear document or video that will show the helm installation for the same. Such a document should be really helpful as I am pretty sure that many of them might have faced this issue.