Restore percona xtradb cluster by XtraBackup on GCP. Got some problem

I have an old Percona Xtradb cluster with 3 nodes by GCP deploy automatically. And we also deploy new Percona Xtradb cluster with 3 nodes by GCP deploy. Then I try to backup data from old cluster to new one. My process is like this

first, backup data from old cluster:

xtrabackup --backup --target-dir=/data/backups/

Then rsync to new cluster

rsync -avrP /data/backups/ 10.xxx.xxx.1:/data/mysql/

And give permission:

chown mysql:mysql -R /data/mysql

Then start mysql, The first mode is working now. But another two node didn’t sync. So I try rsync to another node But second node after import data. mysql start is failed.

I also try before rsync data, stop mysql first. But it is the same. Always first node is successful. but second and third failed.

I have some question. When I rsync data to new cluster. Should I rsync all node? And before rsync should I stop mysql on new node? I also follow official document. use --prepare and --copy-back or innobackupex --apply-log . They look like similar. And I can’t restore data and sync them.

Hope tell how to restore. Thank you.

1 Like

My backup document size is 19G.
When I rsync one node. others node can rsync?

1 Like

Steps:

  • xtrabackup --backup --target-dir=/data/backups/
  • rsync to server
  • xtrabackup --prepare
  • chown mysql
  • start mysql

When you start node 2 and 3, they should automatically SST from node 1. If this is failing, you need to troubleshoot this. Look at node 1 error log for any possible issues.

1 Like

Hi @Russel_HSU , thank you for posting! Welcome to the Percona forums :slight_smile:

Taking a backup and restoring to a new cluster with Percona XtraBackup follows these general steps from our documentation:

  1. Create the backup using xtrabackup --backup
  2. Copy the data to the new server
  3. Execute the --prepare phase
  4. Move the files into the datadir
  5. Set file permissions appropriately
  6. Start the binary for PXC

Once you have one node running for PXC at the restored location, you will be better served to let PXC SST bootstrap the other nodes for you as it will take backups automatically and restore them for you.

If you’ve followed these steps and are still experiencing issues, can you show us the logs from your attempts to bootstrap node2 and node3 to the cluster?

In response to your questions - you can safely take a Percona XtraBackup backup from a running PXC instance. But when performing the --prepare and later operations on the instance to which you are restoring, you should not have a running mysqld otherwise you will see corruption and other issues.

1 Like

Thank your helping. I wanna check some detail and try again. From your reply, my understanding is when I restore, should not have a running mysqld. So before --prepare on node1, I should stop 3 nodes mysql, and --prepare on node 1. Then copy-back or move-back the data and start node 1 mysql. If successful. I can start node 2 and node 3. They should automatically SST from node 1. And finish. Right? And If fail, I will check node 2 and node 3 logs. Above correct?

1 Like

Thank you for your helping. I wanna ask, when I start node 2 and node 3, because my data is 19GB. Should I wait it SST? I’m not sure the SST principle. If I just restore from node 1. Then fast start on node 2 and node 3. Should I wait?

1 Like

when I restore, should not have a running mysqld

Correct.

SST = state snapshot transfer. the SST process uses xtrabackup to take a backup and stream it to the joining node. So restoring a backup or using SST accomplishes the same thing and should take about the same amount of total time. The SST process is recommended because it is handled automatically.

1 Like

Hello, I tried restore again. This time before I “–prepare”, I had closed 3 nodes. Then --prepare and --copy-back on node1. It is the same. After copy data and chown mysql. Node 1 is successful to start mysql. And I start Node 2. It is failed. I check logs on node 2. Can you help me?

2021-05-10T02:54:50.702855Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.33-36-57) starting as process 5548 ...
2021-05-10T02:54:50.707897Z 0 [Note] WSREP: Setting wsrep_ready to false
2021-05-10T02:54:50.707922Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2021-05-10T02:54:50.707929Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2021-05-10T02:54:50.710275Z 0 [Note] WSREP: wsrep_load(): Galera 3.49(r0ef0d79) by Codership Oy <info@codership.com> loaded successfully.
2021-05-10T02:54:50.710363Z 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-05-10T02:54:50.710923Z 0 [Note] WSREP: Found saved state: f074dcea-9adb-11eb-a592-46c651bdce73:8, safe_to_bootstrap: 0
2021-05-10T02:54:50.728798Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.101.0.61; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 2G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 128; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 9; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto; 
2021-05-10T02:54:50.764128Z 0 [Note] WSREP: GCache history reset: f074dcea-9adb-11eb-a592-46c651bdce73:0 -> f074dcea-9adb-11eb-a592-46c651bdce73:8
2021-05-10T02:54:50.765806Z 0 [Note] WSREP: Assign initial position for certification: 8, protocol version: -1
2021-05-10T02:54:50.765934Z 0 [Note] WSREP: Preparing to initiate SST/IST
2021-05-10T02:54:50.765952Z 0 [Note] WSREP: Starting replication
2021-05-10T02:54:50.765977Z 0 [Note] WSREP: Setting initial position to f074dcea-9adb-11eb-a592-46c651bdce73:8
2021-05-10T02:54:50.766254Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2021-05-10T02:54:50.766390Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2021-05-10T02:54:50.766540Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2021-05-10T02:54:50.766550Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2021-05-10T02:54:50.766804Z 0 [Note] WSREP: GMCast version 0
2021-05-10T02:54:50.778574Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2021-05-10T02:54:50.778624Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2021-05-10T02:54:50.779105Z 0 [Note] WSREP: EVS version 0
2021-05-10T02:54:50.779222Z 0 [Note] WSREP: gcomm: connecting to group 'test', peer 'test-percona-perconaxtradb-vm-0:,test-percona-perconaxtradb-vm-1:,test-percona-perconaxtradb-vm-2:'
2021-05-10T02:54:50.779864Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.101.0.61:4567
2021-05-10T02:54:50.785442Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') connection established to 55248078 tcp://10.101.0.62:4567
2021-05-10T02:54:50.785551Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2021-05-10T02:54:51.036765Z 0 [Note] WSREP: declaring 55248078 at tcp://10.101.0.62:4567 stable
2021-05-10T02:54:51.037502Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,16bdf161,6)
memb {
	16bdf161,0
	55248078,0
	}
joined {
	}
left {
	}
partitioned {
	6263777e,0
	66f31338,0
	}
)
2021-05-10T02:54:54.281022Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') turning message relay requesting off
2021-05-10T02:55:21.290606Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
	 at gcomm/src/pc.cpp:connect():161
2021-05-10T02:55:21.290657Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2021-05-10T02:55:21.290817Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1527: Failed to open channel 'test' at 'gcomm://test-percona-perconaxtradb-vm-0,test-percona-perconaxtradb-vm-1,test-percona-perconaxtradb-vm-2': -110 (Connection timed out)
2021-05-10T02:55:21.290832Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2021-05-10T02:55:21.290841Z 0 [ERROR] WSREP: Provider/Node (gcomm://test-percona-perconaxtradb-vm-0,test-percona-perconaxtradb-vm-1,test-percona-perconaxtradb-vm-2) failed to establish connection with cluster (reason: 7)
2021-05-10T02:55:21.290846Z 0 [ERROR] Aborting

2021-05-10T02:55:21.290867Z 0 [Note] Giving 0 client threads a chance to die gracefully
2021-05-10T02:55:21.290875Z 0 [Note] WSREP: Waiting for active wsrep applier to exit
2021-05-10T02:55:21.290879Z 0 [Note] WSREP: Service disconnected.
2021-05-10T02:55:21.290882Z 0 [Note] WSREP: Waiting to close threads......
2021-05-10T02:55:26.291127Z 0 [Note] WSREP: Some threads may fail to exit.
2021-05-10T02:55:26.302798Z 0 [Note] WSREP: Service thread queue flushed.
2021-05-10T02:55:26.302932Z 0 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2021-05-10T02:55:26.303022Z 0 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 0)
2021-05-10T02:55:26.306561Z 0 [Note] Binlog end
2021-05-10T02:55:26.306721Z 0 [Note] /usr/sbin/mysqld: Shutdown complete 

I saw it try SST, but show the ERROR WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory).
Is it the main problem? How can I resolve it?

1 Like

Your gcomm:// should include the port on each node :4567 explicitly. gcomm://server1:4567,server2:4567

Also, any firewalls? iptables?

1 Like

Yes. It is allowed. GCP deployment auto build the firewall. I saw it is allowed.

1 Like

When node1 is running, go to node2 and ‘telnet node1 4567’ Do you get a response? or does this timeout? If it times out, you have a firewall/network issue.

1 Like

Yes it is opened

root@test-percona-perconaxtradb-vm-1:~# nmap 10.101.0.66 -p 4567

Starting Nmap 7.40 ( https://nmap.org ) at 2021-05-11 01:50 UTC
Nmap scan report for test-percona-perconaxtradb-vm-0.c.smeeth-app.internal (10.101.0.66)
Host is up (0.0016s latency).
PORT     STATE SERVICE
4567/tcp open  tram

Nmap done: 1 IP address (1 host up) scanned in 0.54 seconds

And I had tried again. Here have some logs.

2021-05-10T10:06:59.725982Z 1 [Note] WSREP: State transfer required: 
	Group state: f074dcea-9adb-11eb-a592-46c651bdce73:8
	Local state: 00000000-0000-0000-0000-000000000000:-1
2021-05-10T10:06:59.725998Z 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2021-05-10T10:06:59.726002Z 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2021-05-10T10:06:59.726013Z 1 [Note] WSREP: New cluster view: global state: f074dcea-9adb-11eb-a592-46c651bdce73:8, view# 5: Primary, number of nodes: 2, my index: 0, protocol version 3
2021-05-10T10:06:59.726016Z 1 [Note] WSREP: Setting wsrep_ready to true
2021-05-10T10:06:59.726020Z 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
2021-05-10T10:06:59.726023Z 1 [Note] WSREP: Setting wsrep_ready to false
2021-05-10T10:06:59.726118Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.101.0.64' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '5562' --mysqld-version '5.7.33-36-57'   '' )
	2021-05-10T10:07:00.187506Z WSREP_SST: [INFO] Streaming with xbstream
2021-05-10T10:07:00.507815Z 1 [Note] WSREP: Prepared SST/IST request: xtrabackup-v2|10.101.0.64:4444/xtrabackup_sst//1
2021-05-10T10:07:00.507851Z 1 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 1 -> 1) (Increment: 1 -> 2)
2021-05-10T10:07:00.507863Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-05-10T10:07:00.507901Z 1 [Note] WSREP: Assign initial position for certification: 8, protocol version: 4
2021-05-10T10:07:00.507939Z 0 [Note] WSREP: Service thread queue flushed.
2021-05-10T10:07:00.507956Z 1 [Note] WSREP: Check if state gap can be serviced using IST
2021-05-10T10:07:00.507971Z 1 [Note] WSREP: Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: f074dcea-9adb-11eb-a592-46c651bdce73
2021-05-10T10:07:00.508016Z 1 [Note] WSREP: State gap can't be serviced using IST. Switching to SST
2021-05-10T10:07:00.508024Z 1 [Note] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (f074dcea-9adb-11eb-a592-46c651bdce73): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():516. IST will be unavailable.
2021-05-10T10:07:00.508766Z 0 [Note] WSREP: Member 0.0 (test-percona-perconaxtradb-vm-1) requested state transfer from '*any*'. Selected 1.0 (test-percona-perconaxtradb-vm-0)(SYNCED) as donor.
2021-05-10T10:07:00.508784Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8)
2021-05-10T10:07:00.508833Z 1 [Note] WSREP: Requesting state transfer: success, donor: 1
2021-05-10T10:07:00.508852Z 1 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> f074dcea-9adb-11eb-a592-46c651bdce73:8
	2021-05-10T10:07:01.292094Z WSREP_SST: [INFO] Streaming with xbstream
	2021-05-10T10:07:01.302486Z WSREP_SST: [INFO] Proceeding with SST.........
	2021-05-10T10:07:01.317911Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2021-05-10T10:07:02.224884Z 0 [Note] WSREP: (7500ee6f, 'tcp://0.0.0.0:4567') turning message relay requesting off
	2021-05-10T10:07:13.813079Z WSREP_SST: [ERROR] ******************* FATAL ERROR ********************** 
	2021-05-10T10:07:13.814324Z WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log
	2021-05-10T10:07:13.815563Z WSREP_SST: [ERROR] ****************************************************** 
	2021-05-10T10:07:13.816979Z WSREP_SST: [ERROR] Cleanup after exit with status:2
2021-05-10T10:07:13.824639Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.101.0.64' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '5562' --mysqld-version '5.7.33-36-57'   '' : 2 (No such file or directory)
2021-05-10T10:07:13.824673Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2021-05-10T10:07:13.824680Z 0 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2021-05-10T10:07:13.824715Z 0 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2021-05-10T10:07:13.824725Z 0 [ERROR] Aborting

I saw some error is about the UUID?
And this one

WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log 

where I can find the DONOR.log??

1 Like

It’s not telling you to look at DONOR.log, it’s telling you to look at the log file on the DONOR. So go to node2 and look in mysql’s data directory. There should be some logs in there for mysql and for the xtrabackup process.

1 Like

Hello. Practice these days. But often failed. I have one more question.
When I have new cluster and stop all nodes before backup.
Should I delete all mysql data on all nodes??
So my step:

  • xtrabackup --backup --target-dir=/data/backups/
  • rsync to server
  • xtrabackup --prepare
  • stop new cluster mysql all nodes
  • rm -r /data/mysql/*
  • copy-back data
  • chown mysql
  • start mysql

I wanna ask , after stop all mysql, should I delete all mysql data on all nodes? Or just the single prepare to copy back server?

1 Like

When I follow these steps
The first node have copy-data and I start on first nodes

/etc/init.d/mysql bootstrap-pxc

and check the mysql status it still stop, so I start the mysql

service mysql start

then I start node2 and node 3. They are connected I checked by

mysql > show status like ‘wsrep%’;

SO now first node have copy-data and node 2 & 3 no data and all mysql are startted

Question: How can I know The SST is doing ??

1 Like

You can watch mysql’s error log to see the SST process. There’s also an xtrabackup log file in the datadir while the SST is happening.

1 Like

My scenario is similar to OP but I’m having trouble restoring an xtrabackup to a freshly setup cluster using this guide. So far, no databases, everything default and as per the configs from the how-to I followed.

The backup was taken from an older environment running xtradb cluster 5.7 using xtrabackup 2.4. I streamed the backup file to a node in the new cluster and extracted it there. I installed xtrabackup 2.4 on the new node as I had trouble running the prepare phase using 8.0

First I stop mysql on all the nodes. After extracting and preparing, I clear the mysql data directory and copy over the prepared backup files. At this point node 1 only contains the backup files.

I then bootstrap this node and it fails. Is there a guide somewhere I haven’t come across in the last year of trying to get this right?

root@db-node-01:~# service mysql@bootstrap.service start
Job for mysql@bootstrap.service.service failed because of unavailable resources or another system error.
See "systemctl status mysql@bootstrap.service.service" and "journalctl -xe" for details.
root@db-node-01:~# journalctl -xe
Apr 22 15:40:07 db-node-01 systemd[1]: mysql@bootstrap.service.service: Failed to load environment files: No such file or directory
Apr 22 15:40:07 db-node-01 systemd[1]: mysql@bootstrap.service.service: Failed to run 'start-pre' task: No such file or directory
Apr 22 15:40:07 db-node-01 systemd[1]: mysql@bootstrap.service.service: Failed with result 'resources'.

Looking at the log files, I see errors about the SSL certificates that were generated when I initially installed pxc. Luckily I saved those, so I moved them back and tried again, no luck.

I just wish that there was a simple guide to follow. I would have thought this is a very common usage scenario ie backup on old 5.7 cluster and restore on new 8.0 cluster but alas I cannot find anything