Percona XtraDB Cluster on Kubernetes mysql operator (Point-In-Time Recovery) not happening

HI team,
I m facing a issue while doing a Point-In-Time Recovery restore in percona MySQL operator in K8s,
As mentioned in the blog for a PITR I should mention the latest name + the PTIR should be turn on and I did turn on PTIR and all binlogs were pushing to gcs/bucket.
after taking a backup via cron, a set of changes were happing in operator(database) then I also notices those binlogs are also pushed to gcs/bucket
In order to test the PITR as mentioned in percona blog i gave the latest name + PITR eg below

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore1
spec:
  pxcCluster: cluster1
  backupName: backup1
  pitr:
    type: latest
    date: "yyyy-mm-dd hh:mm:ss"
    gtid: "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee:nnn"
    backupSource:
      storageName: "STORAGE-NAME-HERE"
      s3:
        bucket: S3-BINLOG-BACKUP-BUCKET-NAME-HERE
        credentialsSecret: my-cluster-name-backup-s3
        endpointUrl: https://s3.us-west-2.amazonaws.com/
        region: us-west-
  1. but i m facing a error as below
  Comments:  run restore: s3: nil s3 backup status
  State:     Failed
Events:      <none>

Would appreciate any help
Thanks in advance

Link : Providing Backups

1 Like

Hello @mohamedkashifuddin,

first of all lets try to remove everything which is not needed:

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore1
spec:
  pxcCluster: cluster1
  backupName: backup1
  pitr:
    type: latest
    backupSource:
      storageName: "STORAGE-NAME-HERE"

Could you please also check the log of the restore pod/container that is created?

1 Like

Hi @spronin,
The restore pod/container did not get created because it failed in the initial stage i got the error message from below command.
kubectl get pxc-restore restore1testpitr5 -o yaml

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"pxc.percona.com/v1","kind":"PerconaXtraDBClusterRestore","metadata":{"annotations":{},"name":"restore1testpitr5","namespace":"mysql-operator-rg-1"},"spec":{"backupSource":{"destination":"s3://mysql-temp/active/mysql-cluster-rg-1-2021-10-27-07:55:00-full"},"pitr":{"date":"2021-Oct-27 3:00:00","gtid":"da4a3206-362a-11ec-b4da-138a9df99657:74299-74342","s3":{"bucket":"s3://mysq--temp/active/","credentialsSecret":"my-cluster-name-backup-s3","endpointUrl":"https://storage.googleapis.com","region":"us-central1"},"type":"latest"},"pxcCluster":"mysql-cluster-rg-1"}}
  creationTimestamp: "2021-10-28T09:25:24Z"
  generation: 1
  name: restore1testpitr5
  namespace: mysql-operator-rg-1
  resourceVersion: "9840069"
  uid: 9c21df33-0e38-4041-a7e5-5bc253a1a7f7
spec:
  backupSource:
    destination: s3://mysql-temp/active/mysql-cluster-rg-1-2021-10-27-07:55:00-full
  pitr:
    date: 2021-Oct-27 3:00:00
    gtid: da4a3206-362a-11ec-b4da-138a9df99657:74299-74342
    s3:
      bucket: s3://mysql-temp/active/
      credentialsSecret: my-cluster-name-backup-s3
      endpointUrl: https://storage.googleapis.com
      region: us-central1
    type: latest
  pxcCluster: mysql-cluster-rg-1
status:
  comments: 'run restore: s3: nil s3 backup status'
  state: Failed

1 Like

I still see you have lots of fields that should not be there. Could you please trim them?

Also, there seems to be a confusion about S3 bucket config.

  1. There are two buckets/storages - one for PITR, another for regular backups.
  2. In restore yaml you can reference both - storageName or just S3 bucket.

One more thing - please tell me what is your goal. From your restore.yaml I don’t get it. But if I assume that you need the following:

  1. New k8s cluster, new PXC deployed - mysql-cluster-rg-1
  2. Both full backups and binary logs are stored in mysql-temp bucket and active folder
  3. The goal is to recover from full backup and to the latest date from PITR
    If so, you restore.yaml should look like this:
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore1
spec:
  pxcCluster: mysql-cluster-rg-1
  backupSource:
    destination: s3://mysql-temp/active/mysql-cluster-rg-1-2021-10-27-07:55:00-full
    s3:
     credentialsSecret: my-cluster-name-backup-s3
     endpointUrl: https://storage.googleapis.com
     region: us-central1
  pitr:
    type: latest
    backupSource:
      s3:
        bucket: "mysql-temp/active"
        credentialsSecret: my-cluster-name-backup-s3
        endpointUrl: https://storage.googleapis.com
        region: us-central1        

As you see you need to specify creds twice. I can be avoided if you use storageName.

1 Like

Hi @spronin
Thanks for your help, I appreciate it

1 Like

Hi @spronin, did the exact same thing as the steps you mentioned above but only pxc-0 is up after restore and not pxc-1 and 2 please see the attachment. operator logs are attached. please suggest me where im doing wrong or how to troubleshoot this. Thank you so much in advance.
operatorlogs.txt (3.0 KB)

1 Like

Hello @ajay ,

not sure I understand what you did.

Are you sure you have 3 pods in your PXC StatefulSet and custom resource?
Please show kubectl describe sts pxc-db-pxc.

1 Like

Hi @spronin, Basically i had the PXC cluster with 3 instances(pods) and i did the restore by using below and changed appropriate values there.

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore1
spec:
  pxcCluster: mysql-cluster-rg-1
  backupSource:
    destination: s3://mysql-temp/active/mysql-cluster-rg-1-2021-10-27-07:55:00-full
    s3:
     credentialsSecret: my-cluster-name-backup-s3
     endpointUrl: https://storage.googleapis.com
     region: us-central1
  pitr:
    type: latest
    backupSource:
      s3:
        bucket: "mysql-temp/active"
        credentialsSecret: my-cluster-name-backup-s3
        endpointUrl: https://storage.googleapis.com
        region: us-central1 

and the result is as i posted above after the restore.(FYI- when the backup made i have 3 instances of PXC and Proxysql in my k8s cluster in that namespace).

And after restore i could see only one instance of pxc and when i describe the PXC I could see { Replicas: 1 desired | 1 total } and when i describe the proxysql-sts i could see 3 replicas.

I scaled up the pxc but it is not happening and operator logs as below

{"level":"info","ts":1648476542.3586774,"caller":"pxc/upgrade.go:249","msg":"statefulSet was changed, run smart update"}
{"level":"info","ts":1648476542.3587465,"caller":"pxc/upgrade.go:262","msg":"can't start/continue 'SmartUpdate': waiting for all replicas are ready"}

Not sure why? please suggest me here.
proxy-sts.txt (4.0 KB)
sts-pxc.txt (5.1 KB)

1 Like

weird, your stateful set has one node.

Could you please share your cr? kubectl get pxc pxc-db -o yaml ?

1 Like

Hello @spronin - i’ve attached the yaml as pxc-db.txt
pxc-db.txt (6.6 KB)

I used the below for restore and attached restore pod logs aswell.
logsrestore.txt (455.3 KB)

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore45
spec:
  pxcCluster: pxc-db
  backupSource:
    destination: s3://percona-backups-release-ops/backups/pxc-db-2022-03-14-00:00:00-full
    s3:
     credentialsSecret: pxc-db-s3-s3-us-east
     endpointUrl: https://s3.amazonaws.com
     region: us-east-1
  pitr:
    type: transaction
    gtid: "binlog_1645544965_d48447f6fe841c87e618e8fed8970ab1"
    backupSource:
      s3:
        bucket: "percona-backups-release-ops/pitr"
        credentialsSecret: pxc-db-s3-s3-us-east
        endpointUrl: https://s3.amazonaws.com
        region: us-east-1

So I see in pxc-db.txt that your pxc.size is 1 and you also have allowUnsafeConfigurationsset to true, which indicates that you changed the size to 1. So the behavior you see is expected - one node is up.

Or are you saying you were not changing pxc.size from 3 to 1 and it happened automatically somehow?

@spronin - setting up allowUnsafeConfigurations set to true is intentional. Size from 3 to 1 is happening automatically upon restore regardless of spec.pitr.type=transaction/date/latest.

Sorry did not mentioned this earlier, below are pxc-operator logs when i run restore job:

{"level":"error","ts":1648640228.7463877,"caller":"pxc/controller.go:1139","msg":"sync users","error":"exec syncusers: command terminated with exit code 1 /  / ERROR 2005 (HY000): Unknown MySQL server host 'pxc-db-pxc-0.pxc-db-pxc.pxc.svc.cluster.local' (2)\nERROR (line:1893) : Could not find a primary cluster node\n","errorVerbose":"exec syncusers: command terminated with exit code 1 /  / ERROR 2005 (HY000): Unknown MySQL server host 'pxc-db-pxc-0.pxc-db-pxc.pxc.svc.cluster.local' (2)\nERROR (line:1893) : Could not find a primary cluster node\n\ngithub.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).syncPXCUsersWithProxySQL\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/users.go:491\ngithub.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).resyncPXCUsersWithProxySQL.func1\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/controller.go:1137\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581","stacktrace":"github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc.(*ReconcilePerconaXtraDBCluster).resyncPXCUsersWithProxySQL.func1\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxc/controller.go:1139"}
{"level":"info","ts":1648640229.045958,"caller":"pxcrestore/controller.go:222","msg":"point-in-time recovering","cluster":"pxc-db"}
{"level":"info","ts":1648640229.61903,"caller":"pxc/controller.go:465","msg":"reconcile replication error","err":"get primary pxc pod: not found"}
{"level":"info","ts":1648640229.6228173,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640229.6228752,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640234.639589,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640234.6396563,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640240.72782,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640240.727888,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640243.1580393,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640243.1581097,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640246.8310893,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640246.8311534,"caller":"pxc/backup.go:87","msg":"Creating or updating backup job","name":"789e0-daily-backup","schedule":"0 0 * * *"}
{"level":"info","ts":1648640252.089586,"caller":"pxcrestore/controller.go:243","msg":"You can view xtrabackup log:\n$ kubectl logs job/restore-job-restore46-pxc-db\nIf everything is fine, you can cleanup the job:\n$ kubectl delete pxc-restore/restore46\n"}

FYI- I followed below procedure for restore as well there i have some issues too.
(please see my other question here (How to initialize a new cluster from a S3 backup? - #9 by ajay) where my restore from a latest backup is bringing up 3 nodes of PXC and im trying to target the specific date’s backup to restore but it always taking the latest and my latest backup dont have any databases that i manually imported that is why im targeting specific date’s backup. thoughts…)

1 Like

Hi @spronin any suggestions please…

1 Like

Hello @ajay ,

sorry for not coming earlier, I usually do forum reviews on Mondays :slight_smile:
I did the following trying to reproduce your issue:

  1. Spun up PXC cluster1 with default cr.yaml
  2. Enabled pitr and took backups to the bucket (I used new bucket)
  3. Spun up PXC cluster2 in another namespace with default cr.yaml
  4. Used similar to yours restore.yaml to recover to specific transaction
  5. Cluster recovered successfully (3 nodes are back)

I was using GKE 1.21, GCS for backups.
I was not playing with allowSafeConfigurations flag.

Can you easily reproduce the issue you are describing? Please break it down step by step and we will try to reproduce it on our end.

1 Like