"Observed a panic in reconciler" on the backup restore request [perconaxtradbclusterrestore.pxc.percona.com]

Hello,

I am following some instructions from this document in order to restore my PXC cluster from my backup located on S3.

As my first step, I have created a secret and installed PMM into the freshly created namespace.

helm install pmm --namespace percona --set secret.create=false --set secret.name=pmm-secret percona/pmm

Then I have installed Percona Operator for MySQL into the same NS:

helm install percona-operator percona/pxc-operator --namespace percona

And the Percona XtraDB Cluster itself alongside:

--set haproxy.nodeSelector.cluster=percona-cluster-node \
--set pxc.nodeSelector.cluster=percona-cluster-node \
--set pxc.resources.requests.cpu=2500m \
--set pxc.resources.requests.memory=6Gi \
--set pxc.resources.limits.cpu=2500m \
--set pxc.resources.limits.memory=6Gi \
--set pxc.persistence.enabled=true \
--set pxc.persistence.size=15Gi \
--set pxc.persistence.storageClass=hcloud-volumes \
--set pmm.enabled=true \
--set pmm.resources.requests.cpu=200m \
--set pmm.resources.requests.memory=150M \
--set pmm.resources.limits.cpu=200m \
--set pmm.resources.limits.memory=150M \
--set haproxy.resources.requests.cpu=300m \
--set haproxy.resources.requests.memory=512M \
--set haproxy.resources.limits.cpu=300m \
--set haproxy.resources.limits.memory=512M \
--set logcollector.resources.requests.cpu=100m \
--set logcollector.resources.requests.memory=100M \
--set backup.enabled=false \
--set secrets.passwords.root=$(openssl rand -base64 32 | tr -d '\n') \
--set secrets.passwords.pmmserverkey=**************************************  \
--set secrets.passwords.xtrabackup=$(openssl rand -base64 32 | tr -d '\n') \
--set secrets.passwords.replication=$(openssl rand -base64 32 | tr -d '\n') \
--set secrets.passwords.proxyadmin=$(openssl rand -base64 32 | tr -d '\n') \
--set secrets.passwords.operator=$(openssl rand -base64 32 | tr -d '\n') \
--set secrets.passwords.monitor=$(openssl rand -base64 32 | tr -d '\n') \
--set secrets.passwords.clustercheck=$(openssl rand -base64 32 | tr -d '\n')

At that point the whole setup seemed healthy and functional to me as I was able to login into the PMM Dashboard and to observe some cluster data there, I could spot no restarting pods or whatever on my k8s-dashboard as well.

root@master-001:~/k8s/pxc# kubectl get all -n percona
NAME                                                 READY   STATUS    RESTARTS      AGE
pod/percona-cluster-pxc-d-haproxy-0                  3/3     Running   0             179m
pod/percona-cluster-pxc-d-haproxy-1                  3/3     Running   0             179m
pod/percona-cluster-pxc-d-haproxy-2                  3/3     Running   0             3h
pod/percona-cluster-pxc-d-pxc-0                      4/4     Running   0             3h
pod/percona-cluster-pxc-d-pxc-1                      4/4     Running   0             3h1m
pod/percona-cluster-pxc-d-pxc-2                      4/4     Running   0             3h2m
pod/percona-operator-pxc-operator-687f56d59d-4hkqj   1/1     Running   0             18h
pod/pmm-0                                            1/1     Running   0             46h

NAME                                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                 AGE
service/monitoring-service                       NodePort    10.xxx.xxx.211   <none>        443:31958/TCP,80:32217/TCP              46h
service/percona-cluster-pxc-d-haproxy            ClusterIP   10.xxx.xxx.59       <none>        3306/TCP,3309/TCP,33062/TCP,33060/TCP   3h4m
service/percona-cluster-pxc-d-haproxy-replicas   ClusterIP   10.xxx.xxx.92    <none>        3306/TCP                                3h4m
service/percona-cluster-pxc-d-pxc                ClusterIP   None             <none>        3306/TCP,33062/TCP,33060/TCP            3h4m
service/percona-cluster-pxc-d-pxc-unready        ClusterIP   None             <none>        3306/TCP,33062/TCP,33060/TCP            3h4m
service/percona-xtradb-cluster-operator          ClusterIP   10.xxx.xxx.172    <none>        443/TCP                                 18h

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/percona-operator-pxc-operator   1/1     1            1           18h

NAME                                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/percona-operator-pxc-operator-687f56d59d   1         1         1       18h

NAME                                             READY   AGE
statefulset.apps/percona-cluster-pxc-d-haproxy   3/3     3h4m
statefulset.apps/percona-cluster-pxc-d-pxc       3/3     3h4m
statefulset.apps/pmm                             1/1     46h

As for my source environment, it is a regular Ubuntu 22.04 Server with mysql-server installed there. I have installed Xtrabackup 8 and made a succesful backup to my AWS S3 bucket.

  370  wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb
  371  dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb
  372  percona-release setup ps80
  373  sudo apt update
  374  sudo apt install percona-xtrabackup-80
  375  sudo apt install lz4
  376  xtrabackup
  395  xtrabackup --backup --stream=xbstream --extra-lsndir=/tmp --target-dir=/tmp | 	xbcloud put --storage=s3 	--s3-endpoint='s3.amazonaws.com' 	--s3-region='eu-central-1' 	--s3-access-key=$AWS_ACCESS_KEY_ID 	--s3-secret-key=$AWS_SECRET_ACCESS_KEY 	--s3-bucket='****************-percona-db-backups' 	--parallel=10 	$(date -I)-full_backup

This is what I can see inside my S3 bucket

% aws s3 ls ****************-percona-db-backups/2024-05-08-full_backup/
                           PRE app/
                           PRE mysql/
                           PRE performance_schema/
                           PRE sys/
2024-05-09 00:17:50        496 backup-my.cnf.00000000000000000000
2024-05-09 00:17:50         27 backup-my.cnf.00000000000000000001
2024-05-09 00:17:50        244 binlog.000150.00000000000000000000
2024-05-09 00:17:50         27 binlog.000150.00000000000000000001
2024-05-09 00:17:50         62 binlog.index.00000000000000000000
2024-05-09 00:17:50         26 binlog.index.00000000000000000001
2024-05-09 00:17:50     759599 ib_buffer_pool.00000000000000000000
2024-05-09 00:17:50         28 ib_buffer_pool.00000000000000000001
2024-05-09 00:17:23   10485801 ibdata1.00000000000000000000
2024-05-09 00:17:23   10485801 ibdata1.00000000000000000001
2024-05-09 00:17:23   10485801 ibdata1.00000000000000000002
2024-05-09 00:17:23   10485801 ibdata1.00000000000000000003
2024-05-09 00:17:23   10485801 ibdata1.00000000000000000004
2024-05-09 00:17:23   10485801 ibdata1.00000000000000000005
2024-05-09 00:17:23   10485801 ibdata1.00000000000000000006
2024-05-09 00:17:23    6291497 ibdata1.00000000000000000007
2024-05-09 00:17:23         21 ibdata1.00000000000000000008
2024-05-09 00:17:49   10485803 mysql.ibd.00000000000000000000
2024-05-09 00:17:49   10485803 mysql.ibd.00000000000000000001
2024-05-09 00:17:49   10485803 mysql.ibd.00000000000000000002
2024-05-09 00:17:49   10485803 mysql.ibd.00000000000000000003
2024-05-09 00:17:49    4194347 mysql.ibd.00000000000000000004
2024-05-09 00:17:49         23 mysql.ibd.00000000000000000005
2024-05-09 00:17:50   10485802 undo_001.00000000000000000000
2024-05-09 00:17:50    6291498 undo_001.00000000000000000001
2024-05-09 00:17:50         22 undo_001.00000000000000000002
2024-05-09 00:17:50   10485802 undo_002.00000000000000000000
2024-05-09 00:17:50    6291498 undo_002.00000000000000000001
2024-05-09 00:17:50         22 undo_002.00000000000000000002
2024-05-09 00:17:50        116 xtrabackup_binlog_info.00000000000000000000
2024-05-09 00:17:50         36 xtrabackup_binlog_info.00000000000000000001
2024-05-09 00:17:50        199 xtrabackup_checkpoints.00000000000000000000
2024-05-09 00:17:50         36 xtrabackup_checkpoints.00000000000000000001
2024-05-09 00:17:51        623 xtrabackup_info.00000000000000000000
2024-05-09 00:17:51         29 xtrabackup_info.00000000000000000001
2024-05-09 00:17:50       3124 xtrabackup_logfile.00000000000000000000
2024-05-09 00:17:50         32 xtrabackup_logfile.00000000000000000001
2024-05-09 00:17:51         95 xtrabackup_tablespaces.00000000000000000000
2024-05-09 00:17:51         36 xtrabackup_tablespaces.00000000000000000001

On my target environment, I have created a secret to store my AWS credentials, which are identical to the ones mentioned above:

root@master-001:~/k8s/pxc# kubectl get secret  ****************-percona-db-backups -n percona -o yaml
apiVersion: v1
data:
  AWS_ACCESS_KEY_ID: ************************
  AWS_SECRET_ACCESS_KEY: ********************
kind: Secret
metadata:
  creationTimestamp: "2024-05-09T14:55:28Z"
  name:  ****************-percona-db-backups
  namespace: percona
  resourceVersion: "5024887"
  uid: 2b468225-2d6f-490f-899d-67725b4f0d23
type: Opaque

I have created the following k8s-manifest to request a restore:

root@master-001:~/k8s/pxc# cat initial-restore.yaml 
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: initial-restore
spec:
  pxcCluster: percona-cluster-pxc-d
  backupSource:
    destination: s3://****************-percona-db-backups/2024-05-08-full_backup
    s3:
      credentialsSecret: ****************-percona-db-backups
      endpointUrl: https://s3.eu-central-1.amazonaws.com
      region: eu-central-1
  pitr:
    type: latest
    backupSource:
      s3:
        bucket: "****************-percona-db-backups"
        credentialsSecret: ****************-percona-db-backups
        endpointUrl: https://s3.eu-central-1.amazonaws.com
        region: eu-central-1

And applying it also seemed to be fine:

root@master-001:~/k8s/pxc# kubectl apply -f initial-restore.yaml -n percona
perconaxtradbclusterrestore.pxc.percona.com/initial-restore created
root@master-001:~/k8s/pxc# kubectl describe pxc-restore/initial-restore -n percona
Name:         initial-restore
Namespace:    percona
Labels:       <none>
Annotations:  <none>
API Version:  pxc.percona.com/v1
Kind:         PerconaXtraDBClusterRestore
Metadata:
  Creation Timestamp:  2024-05-09T17:42:18Z
  Generation:          1
  Resource Version:    5052630
  UID:                 6917c770-e75d-45f8-b4d0-60fda2f9374b
Spec:
  Backup Source:
    Destination:  s3://****************-percona-db-backups/2024-05-08-full_backup
    s3:
      Credentials Secret:  ****************-percona-db-backups
      Endpoint URL:        https://s3.eu-central-1.amazonaws.com
      Region:              eu-central-1
  Pitr:
    Backup Source:
      s3:
        Bucket:              ****************-percona-db-backups
        Credentials Secret:  ****************-percona-db-backups
        Endpoint URL:        https://s3.eu-central-1.amazonaws.com
        Region:              eu-central-1
    Type:                    latest
  Pxc Cluster:               percona-cluster-pxc-d
Status:
  Comments:  You can view xtrabackup log:
$ kubectl logs job/restore-job-initial-restore-percona-cluster-pxc-d
If everything is fine, you can cleanup the job:
$ kubectl delete pxc-restore/initial-restore

  Completed:  2024-05-09T17:42:18Z
  State:      Succeeded
Events:       <none>

I have noticed that no job was created and decided to trace my percona-operator logs to find the reason.

root@master-001:~# kubectl logs pod/percona-operator-pxc-operator-687f56d59d-4hkqj -n percona
2024-05-09T16:34:18.148Z	INFO	setup	Runs on	{"platform": "kubernetes", "version": "v1.27.1"}
2024-05-09T16:34:18.149Z	INFO	setup	Manager starting up	{"gitCommit": "c85a021f2a21441500b02a2c0b3d17e8a8b25996", "gitBranch": "release-1-14-0", "buildTime": "2024-03-01T09:01:29Z", "goVersion": "go1.21.7", "os": "linux", "arch": "amd64"}
2024-05-09T16:34:18.149Z	INFO	setup	Registering Components.
2024-05-09T16:34:19.300Z	INFO	controller-runtime.webhook	Registering webhook	{"path": "/validate-percona-xtradbcluster"}
2024-05-09T16:34:19.300Z	INFO	setup	Starting the Cmd.
2024-05-09T16:34:19.300Z	INFO	controller-runtime.metrics	Starting metrics server
2024-05-09T16:34:19.300Z	INFO	controller-runtime.metrics	Serving metrics server	{"bindAddress": ":8080", "secure": false}
2024-05-09T16:34:19.300Z	INFO	starting server	{"kind": "health probe", "addr": "[::]:8081"}
2024-05-09T16:34:19.301Z	INFO	controller-runtime.webhook	Starting webhook server
2024-05-09T16:34:19.301Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
2024-05-09T16:34:19.301Z	INFO	controller-runtime.webhook	Serving webhook server	{"host": "", "port": 9443}
2024-05-09T16:34:19.301Z	INFO	controller-runtime.certwatcher	Starting certificate watcher
2024-05-09T16:34:19.302Z	INFO	attempting to acquire leader lease percona/08db1feb.percona.com...
2024-05-09T16:34:35.256Z	INFO	successfully acquired lease percona/08db1feb.percona.com
2024-05-09T16:34:35.256Z	INFO	Starting EventSource	{"controller": "pxc-controller", "source": "kind source: *v1.PerconaXtraDBCluster"}
2024-05-09T16:34:35.256Z	INFO	Starting Controller	{"controller": "pxc-controller"}
2024-05-09T16:34:35.256Z	INFO	Starting EventSource	{"controller": "pxcrestore-controller", "source": "kind source: *v1.PerconaXtraDBClusterRestore"}
2024-05-09T16:34:35.256Z	INFO	Starting Controller	{"controller": "pxcrestore-controller"}
2024-05-09T16:34:35.257Z	INFO	Starting EventSource	{"controller": "pxcbackup-controller", "source": "kind source: *v1.PerconaXtraDBClusterBackup"}
2024-05-09T16:34:35.257Z	INFO	Starting Controller	{"controller": "pxcbackup-controller"}
2024-05-09T16:34:35.376Z	INFO	Starting workers	{"controller": "pxcbackup-controller", "worker count": 1}
2024-05-09T16:34:35.405Z	INFO	Starting workers	{"controller": "pxc-controller", "worker count": 1}
2024-05-09T16:34:35.407Z	INFO	Starting workers	{"controller": "pxcrestore-controller", "worker count": 1}
2024-05-09T16:34:37.130Z	INFO	add new job	{"controller": "pxc-controller", "namespace": "percona", "name": "percona-cluster-pxc-d", "reconcileID": "d4b88387-4e1c-42a8-899a-026b61667557", "name": "ensure-version/percona/percona-cluster-pxc-d", "schedule": "0 4 * * *"}
2024-05-09T16:34:37.130Z	INFO	add new job	{"controller": "pxc-controller", "namespace": "percona", "name": "percona-cluster-pxc-d", "reconcileID": "d4b88387-4e1c-42a8-899a-026b61667557", "name": "telemetry/percona/percona-cluster-pxc-d", "schedule": "47 * * * *"}
root@master-001:~# kubectl logs pod/percona-operator-pxc-operator-687f56d59d-4hkqj -n percona --follow
2024-05-09T16:34:18.148Z	INFO	setup	Runs on	{"platform": "kubernetes", "version": "v1.27.1"}
2024-05-09T16:34:18.149Z	INFO	setup	Manager starting up	{"gitCommit": "c85a021f2a21441500b02a2c0b3d17e8a8b25996", "gitBranch": "release-1-14-0", "buildTime": "2024-03-01T09:01:29Z", "goVersion": "go1.21.7", "os": "linux", "arch": "amd64"}
2024-05-09T16:34:18.149Z	INFO	setup	Registering Components.
2024-05-09T16:34:19.300Z	INFO	controller-runtime.webhook	Registering webhook	{"path": "/validate-percona-xtradbcluster"}
2024-05-09T16:34:19.300Z	INFO	setup	Starting the Cmd.
2024-05-09T16:34:19.300Z	INFO	controller-runtime.metrics	Starting metrics server
2024-05-09T16:34:19.300Z	INFO	controller-runtime.metrics	Serving metrics server	{"bindAddress": ":8080", "secure": false}
2024-05-09T16:34:19.300Z	INFO	starting server	{"kind": "health probe", "addr": "[::]:8081"}
2024-05-09T16:34:19.301Z	INFO	controller-runtime.webhook	Starting webhook server
2024-05-09T16:34:19.301Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
2024-05-09T16:34:19.301Z	INFO	controller-runtime.webhook	Serving webhook server	{"host": "", "port": 9443}
2024-05-09T16:34:19.301Z	INFO	controller-runtime.certwatcher	Starting certificate watcher
2024-05-09T16:34:19.302Z	INFO	attempting to acquire leader lease percona/08db1feb.percona.com...
2024-05-09T16:34:35.256Z	INFO	successfully acquired lease percona/08db1feb.percona.com
2024-05-09T16:34:35.256Z	INFO	Starting EventSource	{"controller": "pxc-controller", "source": "kind source: *v1.PerconaXtraDBCluster"}
2024-05-09T16:34:35.256Z	INFO	Starting Controller	{"controller": "pxc-controller"}
2024-05-09T16:34:35.256Z	INFO	Starting EventSource	{"controller": "pxcrestore-controller", "source": "kind source: *v1.PerconaXtraDBClusterRestore"}
2024-05-09T16:34:35.256Z	INFO	Starting Controller	{"controller": "pxcrestore-controller"}
2024-05-09T16:34:35.257Z	INFO	Starting EventSource	{"controller": "pxcbackup-controller", "source": "kind source: *v1.PerconaXtraDBClusterBackup"}
2024-05-09T16:34:35.257Z	INFO	Starting Controller	{"controller": "pxcbackup-controller"}
2024-05-09T16:34:35.376Z	INFO	Starting workers	{"controller": "pxcbackup-controller", "worker count": 1}
2024-05-09T16:34:35.405Z	INFO	Starting workers	{"controller": "pxc-controller", "worker count": 1}
2024-05-09T16:34:35.407Z	INFO	Starting workers	{"controller": "pxcrestore-controller", "worker count": 1}
2024-05-09T16:34:37.130Z	INFO	add new job	{"controller": "pxc-controller", "namespace": "percona", "name": "percona-cluster-pxc-d", "reconcileID": "d4b88387-4e1c-42a8-899a-026b61667557", "name": "ensure-version/percona/percona-cluster-pxc-d", "schedule": "0 4 * * *"}
2024-05-09T16:34:37.130Z	INFO	add new job	{"controller": "pxc-controller", "namespace": "percona", "name": "percona-cluster-pxc-d", "reconcileID": "d4b88387-4e1c-42a8-899a-026b61667557", "name": "telemetry/percona/percona-cluster-pxc-d", "schedule": "47 * * * *"}
2024-05-09T16:36:54.361Z	INFO	backup restore request	{"controller": "pxcrestore-controller", "namespace": "percona", "name": "initial-restore", "reconcileID": "30883363-e4e0-4dd6-93f1-6f8c00c85be1"}
2024-05-09T16:36:54.392Z	INFO	Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference	{"controller": "pxcrestore-controller", "namespace": "percona", "name": "initial-restore", "reconcileID": "30883363-e4e0-4dd6-93f1-6f8c00c85be1"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x1634635]

goroutine 197 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1aa5fe0?, 0x2e707b0?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1.(*PXCBackupStatus).GetStorageType(...)
	/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/apis/pxc/v1/pxc_backup_types.go:140
github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup.RestoreJob(0xc000e901a0, 0xc000353200, 0xc000ad0a00, {0xc000c4ca00, 0x45}, 0x0)
	/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/pxc/backup/restore.go:140 +0x75
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*s3).Job(0xc00065e5d0?)
	/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restorer.go:38 +0x32
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).validate(0x1a8a960?, {0x204fa08, 0xc000f3ef90}, 0xc000e901a0, 0x204f998?, 0xc000ad0a00?)
	/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/restore.go:80 +0x4b
github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore.(*ReconcilePerconaXtraDBClusterRestore).Reconcile(0xc00065e5d0, {0x204fa08, 0xc000f3ef90}, {{{0xc000992606?, 0x5?}, {0xc000992610?, 0xc000ec6d08?}}})
	/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcrestore/controller.go:190 +0xf14
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x2053448?, {0x204fa08?, 0xc000f3ef90?}, {{{0xc000992606?, 0xb?}, {0xc000992610?, 0x0?}}})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0003f20a0, {0x204fa40, 0xc000607a90}, {0x1b4eb40?, 0xc000bf4560?})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0003f20a0, {0x204fa40, 0xc000607a90})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:266 +0x1af
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 130
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:223 +0x565

Could you please kindly tell if I am doing something wrong or is it a defect? TIA