jamoser
November 13, 2023, 10:23am
1
Hello
Not sure but could it be that there is a problem with GKE 1.26 and Operator 1.14 ?
→ after delete/apply the pods did not start due to:
Events:
Type Reason Age From Message
Normal Scheduled 2m56s gke.io/optimize-utilization-scheduler Successfully assigned performance-mongodb-clusters/luz-mongodb00-cluster-rs-0 to gke-performance-default-pool-b4cd7cbc-lm6j
Warning FailedMount 53s kubelet Unable to attach or mount volumes : unmounted volumes=[mongod-data], unattached volumes=[ssl config users-secret-file ssl-internal kube-api-access-fg9wc mongodb00-cluster-mongodb-keyfile mongod-data bin mongodb-cluster-encryption-key]: timed out waiting for the condition
Warning FailedMount 48s (x9 over 2m56s) kubelet MountVolume.MountDevice failed for volume “pvc-55608254-6b41-43e1-a591-9b19484bdf63” : rpc error: code = Aborted desc = An operation with the given Volume ID projects/UNSPECIFIED/zones/europe-west6-a/disks/gke-performance–pvc-55608254-6b41-43e1-a591-9b19484bdf63 already exists
=> it’s kind of urgent !
Hi @jamoser !
Do you have finalizer delete-psmdb-pvc
enabled or not?
jamoser
November 13, 2023, 12:03pm
3
Hello
No but I hope delete-psmdb-pvc
is off if not mentioned !
The PVC/PV are still there …
pvc-55608254-6b41-43e1-a591-9b19484bdf63 already exists
The issue seems, it can not assign existing PVCs. And it seems it’s related to GKE 1.26 because with 1.25 it worked.
Important: cluster is running in non-sharding setup.
I cannot seem to reproduce the issue.
Before delete:
# k get nodes
NAME STATUS ROLES AGE VERSION
gke-plavi-126-default-pool-673a53d4-8tcn Ready <none> 32m v1.26.10-gke.1038000
gke-plavi-126-default-pool-673a53d4-kt38 Ready <none> 31m v1.26.10-gke.1038000
gke-plavi-126-default-pool-673a53d4-tvfb Ready <none> 32m v1.26.10-gke.1038000
# k get pods
NAME READY STATUS RESTARTS AGE
my-cluster-name-rs0-0 2/2 Running 0 110s
my-cluster-name-rs0-1 2/2 Running 0 81s
my-cluster-name-rs0-2 2/2 Running 0 58s
percona-server-mongodb-operator-7b46fb8f97-rt57l 1/1 Running 0 2m27s
# k get pods
NAME READY STATUS RESTARTS AGE
my-cluster-name-rs0-0 2/2 Running 0 119s
my-cluster-name-rs0-1 2/2 Running 0 90s
my-cluster-name-rs0-2 2/2 Running 0 67s
percona-server-mongodb-operator-7b46fb8f97-rt57l 1/1 Running 0 2m36s
# k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongod-data-my-cluster-name-rs0-0 Bound pvc-00321f8e-7fbe-41f8-baf8-b9aaca70948e 3Gi RWO standard-rwo 2m18s
mongod-data-my-cluster-name-rs0-1 Bound pvc-e03b82c9-57bb-4209-82b5-f37c5885bd2f 3Gi RWO standard-rwo 109s
mongod-data-my-cluster-name-rs0-2 Bound pvc-693fd39e-8584-4b8c-ba71-358f4c0234e1 3Gi RWO standard-rwo 86s
# k get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-00321f8e-7fbe-41f8-baf8-b9aaca70948e 3Gi RWO Delete Bound test/mongod-data-my-cluster-name-rs0-0 standard-rwo 2m17s
pvc-693fd39e-8584-4b8c-ba71-358f4c0234e1 3Gi RWO Delete Bound test/mongod-data-my-cluster-name-rs0-2 standard-rwo 86s
pvc-e03b82c9-57bb-4209-82b5-f37c5885bd2f 3Gi RWO Delete Bound test/mongod-data-my-cluster-name-rs0-1 standard-rwo 108s
After delete:
# k delete -f cr.yaml
perconaservermongodb.psmdb.percona.com "my-cluster-name" deleted
# k get pods
NAME READY STATUS RESTARTS AGE
percona-server-mongodb-operator-7b46fb8f97-rt57l 1/1 Running 0 4m3s
# k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongod-data-my-cluster-name-rs0-0 Bound pvc-00321f8e-7fbe-41f8-baf8-b9aaca70948e 3Gi RWO standard-rwo 3m33s
mongod-data-my-cluster-name-rs0-1 Bound pvc-e03b82c9-57bb-4209-82b5-f37c5885bd2f 3Gi RWO standard-rwo 3m4s
mongod-data-my-cluster-name-rs0-2 Bound pvc-693fd39e-8584-4b8c-ba71-358f4c0234e1 3Gi RWO standard-rwo 2m41s
# k get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-00321f8e-7fbe-41f8-baf8-b9aaca70948e 3Gi RWO Delete Bound test/mongod-data-my-cluster-name-rs0-0 standard-rwo 3m34s
pvc-693fd39e-8584-4b8c-ba71-358f4c0234e1 3Gi RWO Delete Bound test/mongod-data-my-cluster-name-rs0-2 standard-rwo 2m43s
pvc-e03b82c9-57bb-4209-82b5-f37c5885bd2f 3Gi RWO Delete Bound test/mongod-data-my-cluster-name-rs0-1 standard-rwo 3m5s
After re-apply:
# ka cr.yaml
perconaservermongodb.psmdb.percona.com/my-cluster-name created
# k get pods
NAME READY STATUS RESTARTS AGE
my-cluster-name-rs0-0 2/2 Running 0 97s
my-cluster-name-rs0-1 2/2 Running 0 76s
my-cluster-name-rs0-2 2/2 Running 0 51s
percona-server-mongodb-operator-7b46fb8f97-rt57l 1/1 Running 0 6m17s
# k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongod-data-my-cluster-name-rs0-0 Bound pvc-00321f8e-7fbe-41f8-baf8-b9aaca70948e 3Gi RWO standard-rwo 5m43s
mongod-data-my-cluster-name-rs0-1 Bound pvc-e03b82c9-57bb-4209-82b5-f37c5885bd2f 3Gi RWO standard-rwo 5m14s
mongod-data-my-cluster-name-rs0-2 Bound pvc-693fd39e-8584-4b8c-ba71-358f4c0234e1 3Gi RWO standard-rwo 4m51s
# k get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-00321f8e-7fbe-41f8-baf8-b9aaca70948e 3Gi RWO Delete Bound test/mongod-data-my-cluster-name-rs0-0 standard-rwo 5m41s
pvc-693fd39e-8584-4b8c-ba71-358f4c0234e1 3Gi RWO Delete Bound test/mongod-data-my-cluster-name-rs0-2 standard-rwo 4m50s
pvc-e03b82c9-57bb-4209-82b5-f37c5885bd2f 3Gi RWO Delete Bound test/mongod-data-my-cluster-name-rs0-1 standard-rwo 5m12s
Can you share your cr.yaml without sensitive data maybe?
Also btw. operator 1.14 was not officially tested with GKE 1.26, 1.25 was latest, although this above I have tried with 1.14 on 1.26.
jamoser
November 13, 2023, 2:35pm
7
I can remember, that with 1.24 and 1.25 it worked. But Google updated the nodes and we got some surprise …
Below the “cr.yaml”
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
labels:
xyz.com/module: my-mongodb
name: my-mongodb00-cluster
namespace: performance-mongodb-clusters
spec:
allowUnsafeConfigurations: false
backup:
enabled: false
crVersion: 1.14.0
image: percona/percona-server-mongodb:4.4.16-16
imagePullPolicy: Always
mongod:
net:
hostPort: 0
port: 27017
operationProfiling:
mode: slowOp
rateLimit: 100
slowOpThresholdMs: 1000
security:
enableEncryption: true
encryptionCipherMode: AES256-CBC
encryptionKeySecret: my-mongodb-cluster-encryption-key
redactClientLogData: false
setParameter:
ttlMonitorSleepSecs: 60
wiredTigerConcurrentReadTransactions: 128
wiredTigerConcurrentWriteTransactions: 128
storage:
engine: wiredTiger
wiredTiger:
collectionConfig:
blockCompressor: snappy
engineConfig:
cacheSizeRatio: 0.005
directoryForIndexes: false
journalCompressor: snappy
indexConfig:
prefixCompression: true
pause: false
pmm:
enabled: false
image: percona/pmm-client:2.35.0
replsets:
- affinity:
antiAffinityTopologyKey: kubernetes.io/hostname
arbiter:
affinity:
antiAffinityTopologyKey: kubernetes.io/hostname
enabled: false
size: 1
configuration: |
systemLog:
quiet: true
storage:
directoryPerDB: true
wiredTiger:
engineConfig:
configString: "file_manager=(close_idle_time=300,close_scan_interval=60,close_handle_minimum=1000)"
expose:
enabled: true
exposeType: NodePort
clusterServiceDNSMode: External
livenessProbe:
failureThreshold: 40
initialDelaySeconds: 1800
name: rs
tolerations:
- effect: NoSchedule
key: mongodb
operator: Exists
podDisruptionBudget:
maxUnavailable: 1
resources:
limits:
cpu: 7000m
memory: 16G
requests:
cpu: 10m
memory: 0.5G
size: 3
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 32Gi
storageClassName: my-mongodb-standard
secrets:
encryptionKey: my-mongodb-cluster-encryption-key
users: my-mongodb-cluster-secrets
sharding:
enabled: false
updateStrategy: SmartUpdate
backup:
enabled: false
image: percona/percona-backup-mongodb:2.0.4
jamoser
November 13, 2023, 4:53pm
9
Ok … I had to upgrade to Operator 1.15 and it worked.
It showed then this error:
Multi-Attach error for volume “pvc-xxxxx” Volume is already exclusively attached to one node and can’t be attached to another
Not sure if this was the same error for Operator 1.14 or if 1.15 handled it differently.
Do you have a matrix where one can see which Operator version is supported for which GKE version ?
Unfortunately our docs show only system requirements for latest released version: System requirements - Percona Operator for MongoDB
But I have now created a ticket to add matrix for this for all operators: [CLOUD-819] Create matrix of supported platforms for each operator and tag docs repo - Percona JIRA
In the ticket I added a table for last 3 versions of PSMDB operator (if you need more at the current moment please comment and I will add more).
[CLOUD-819] Create matrix of supported platforms for each operator and tag docs repo - Percona JIRA
1 Like