Restore percona xtradb cluster by XtraBackup on GCP. Got some problem

Russel_HSU · May 7, 2021, 9:27am

I have an old Percona Xtradb cluster with 3 nodes by GCP deploy automatically. And we also deploy new Percona Xtradb cluster with 3 nodes by GCP deploy. Then I try to backup data from old cluster to new one. My process is like this

first, backup data from old cluster:

xtrabackup --backup --target-dir=/data/backups/

Then rsync to new cluster

rsync -avrP /data/backups/ 10.xxx.xxx.1:/data/mysql/

And give permission:

chown mysql:mysql -R /data/mysql

Then start mysql, The first mode is working now. But another two node didn’t sync. So I try rsync to another node But second node after import data. mysql start is failed.

I also try before rsync data, stop mysql first. But it is the same. Always first node is successful. but second and third failed.

I have some question. When I rsync data to new cluster. Should I rsync all node? And before rsync should I stop mysql on new node? I also follow official document. use --prepare and --copy-back or innobackupex --apply-log . They look like similar. And I can’t restore data and sync them.

Hope tell how to restore. Thank you.

Russel_HSU · May 7, 2021, 9:32am

My backup document size is 19G.
When I rsync one node. others node can rsync?

matthewb · May 7, 2021, 4:00pm

Steps:

xtrabackup --backup --target-dir=/data/backups/
rsync to server
xtrabackup --prepare
chown mysql
start mysql

When you start node 2 and 3, they should automatically SST from node 1. If this is failing, you need to troubleshoot this. Look at node 1 error log for any possible issues.

Michael_Coburn · May 7, 2021, 4:12pm

Hi @Russel_HSU , thank you for posting! Welcome to the Percona forums

Taking a backup and restoring to a new cluster with Percona XtraBackup follows these general steps from our documentation:

Create the backup using xtrabackup --backup
Copy the data to the new server
Execute the --prepare phase
Move the files into the datadir
Set file permissions appropriately
Start the binary for PXC

Once you have one node running for PXC at the restored location, you will be better served to let PXC SST bootstrap the other nodes for you as it will take backups automatically and restore them for you.

If you’ve followed these steps and are still experiencing issues, can you show us the logs from your attempts to bootstrap node2 and node3 to the cluster?

In response to your questions - you can safely take a Percona XtraBackup backup from a running PXC instance. But when performing the --prepare and later operations on the instance to which you are restoring, you should not have a running mysqld otherwise you will see corruption and other issues.

Russel_HSU · May 8, 2021, 3:51pm

Thank your helping. I wanna check some detail and try again. From your reply, my understanding is when I restore, should not have a running mysqld. So before --prepare on node1, I should stop 3 nodes mysql, and --prepare on node 1. Then copy-back or move-back the data and start node 1 mysql. If successful. I can start node 2 and node 3. They should automatically SST from node 1. And finish. Right? And If fail, I will check node 2 and node 3 logs. Above correct?

Russel_HSU · May 8, 2021, 3:57pm

Thank you for your helping. I wanna ask, when I start node 2 and node 3, because my data is 19GB. Should I wait it SST? I’m not sure the SST principle. If I just restore from node 1. Then fast start on node 2 and node 3. Should I wait?

matthewb · May 9, 2021, 12:34am

when I restore, should not have a running mysqld

Correct.

SST = state snapshot transfer. the SST process uses xtrabackup to take a backup and stream it to the joining node. So restoring a backup or using SST accomplishes the same thing and should take about the same amount of total time. The SST process is recommended because it is handled automatically.

Russel_HSU · May 10, 2021, 5:48am

Hello, I tried restore again. This time before I “–prepare”, I had closed 3 nodes. Then --prepare and --copy-back on node1. It is the same. After copy data and chown mysql. Node 1 is successful to start mysql. And I start Node 2. It is failed. I check logs on node 2. Can you help me?

2021-05-10T02:54:50.702855Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.33-36-57) starting as process 5548 ...
2021-05-10T02:54:50.707897Z 0 [Note] WSREP: Setting wsrep_ready to false
2021-05-10T02:54:50.707922Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2021-05-10T02:54:50.707929Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2021-05-10T02:54:50.710275Z 0 [Note] WSREP: wsrep_load(): Galera 3.49(r0ef0d79) by Codership Oy <info@codership.com> loaded successfully.
2021-05-10T02:54:50.710363Z 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-05-10T02:54:50.710923Z 0 [Note] WSREP: Found saved state: f074dcea-9adb-11eb-a592-46c651bdce73:8, safe_to_bootstrap: 0
2021-05-10T02:54:50.728798Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.101.0.61; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 2G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 128; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 9; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto; 
2021-05-10T02:54:50.764128Z 0 [Note] WSREP: GCache history reset: f074dcea-9adb-11eb-a592-46c651bdce73:0 -> f074dcea-9adb-11eb-a592-46c651bdce73:8
2021-05-10T02:54:50.765806Z 0 [Note] WSREP: Assign initial position for certification: 8, protocol version: -1
2021-05-10T02:54:50.765934Z 0 [Note] WSREP: Preparing to initiate SST/IST
2021-05-10T02:54:50.765952Z 0 [Note] WSREP: Starting replication
2021-05-10T02:54:50.765977Z 0 [Note] WSREP: Setting initial position to f074dcea-9adb-11eb-a592-46c651bdce73:8
2021-05-10T02:54:50.766254Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2021-05-10T02:54:50.766390Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2021-05-10T02:54:50.766540Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2021-05-10T02:54:50.766550Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2021-05-10T02:54:50.766804Z 0 [Note] WSREP: GMCast version 0
2021-05-10T02:54:50.778574Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2021-05-10T02:54:50.778624Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2021-05-10T02:54:50.779105Z 0 [Note] WSREP: EVS version 0
2021-05-10T02:54:50.779222Z 0 [Note] WSREP: gcomm: connecting to group 'test', peer 'test-percona-perconaxtradb-vm-0:,test-percona-perconaxtradb-vm-1:,test-percona-perconaxtradb-vm-2:'
2021-05-10T02:54:50.779864Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.101.0.61:4567
2021-05-10T02:54:50.785442Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') connection established to 55248078 tcp://10.101.0.62:4567
2021-05-10T02:54:50.785551Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2021-05-10T02:54:51.036765Z 0 [Note] WSREP: declaring 55248078 at tcp://10.101.0.62:4567 stable
2021-05-10T02:54:51.037502Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,16bdf161,6)
memb {
	16bdf161,0
	55248078,0
	}
joined {
	}
left {
	}
partitioned {
	6263777e,0
	66f31338,0
	}
)
2021-05-10T02:54:54.281022Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') turning message relay requesting off
2021-05-10T02:55:21.290606Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
	 at gcomm/src/pc.cpp:connect():161
2021-05-10T02:55:21.290657Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2021-05-10T02:55:21.290817Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1527: Failed to open channel 'test' at 'gcomm://test-percona-perconaxtradb-vm-0,test-percona-perconaxtradb-vm-1,test-percona-perconaxtradb-vm-2': -110 (Connection timed out)
2021-05-10T02:55:21.290832Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2021-05-10T02:55:21.290841Z 0 [ERROR] WSREP: Provider/Node (gcomm://test-percona-perconaxtradb-vm-0,test-percona-perconaxtradb-vm-1,test-percona-perconaxtradb-vm-2) failed to establish connection with cluster (reason: 7)
2021-05-10T02:55:21.290846Z 0 [ERROR] Aborting

2021-05-10T02:55:21.290867Z 0 [Note] Giving 0 client threads a chance to die gracefully
2021-05-10T02:55:21.290875Z 0 [Note] WSREP: Waiting for active wsrep applier to exit
2021-05-10T02:55:21.290879Z 0 [Note] WSREP: Service disconnected.
2021-05-10T02:55:21.290882Z 0 [Note] WSREP: Waiting to close threads......
2021-05-10T02:55:26.291127Z 0 [Note] WSREP: Some threads may fail to exit.
2021-05-10T02:55:26.302798Z 0 [Note] WSREP: Service thread queue flushed.
2021-05-10T02:55:26.302932Z 0 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2021-05-10T02:55:26.303022Z 0 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 0)
2021-05-10T02:55:26.306561Z 0 [Note] Binlog end
2021-05-10T02:55:26.306721Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

I saw it try SST, but show the ERROR WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory).
Is it the main problem? How can I resolve it?

matthewb · May 10, 2021, 3:29pm

Your gcomm:// should include the port on each node :4567 explicitly. gcomm://server1:4567,server2:4567

Also, any firewalls? iptables?

Russel_HSU · May 11, 2021, 1:26am

Yes. It is allowed. GCP deployment auto build the firewall. I saw it is allowed.

matthewb · May 11, 2021, 1:41am

When node1 is running, go to node2 and ‘telnet node1 4567’ Do you get a response? or does this timeout? If it times out, you have a firewall/network issue.

Russel_HSU · May 11, 2021, 1:54am

Yes it is opened

root@test-percona-perconaxtradb-vm-1:~# nmap 10.101.0.66 -p 4567

Starting Nmap 7.40 ( https://nmap.org ) at 2021-05-11 01:50 UTC
Nmap scan report for test-percona-perconaxtradb-vm-0.c.smeeth-app.internal (10.101.0.66)
Host is up (0.0016s latency).
PORT     STATE SERVICE
4567/tcp open  tram

Nmap done: 1 IP address (1 host up) scanned in 0.54 seconds

And I had tried again. Here have some logs.

2021-05-10T10:06:59.725982Z 1 [Note] WSREP: State transfer required: 
	Group state: f074dcea-9adb-11eb-a592-46c651bdce73:8
	Local state: 00000000-0000-0000-0000-000000000000:-1
2021-05-10T10:06:59.725998Z 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2021-05-10T10:06:59.726002Z 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2021-05-10T10:06:59.726013Z 1 [Note] WSREP: New cluster view: global state: f074dcea-9adb-11eb-a592-46c651bdce73:8, view# 5: Primary, number of nodes: 2, my index: 0, protocol version 3
2021-05-10T10:06:59.726016Z 1 [Note] WSREP: Setting wsrep_ready to true
2021-05-10T10:06:59.726020Z 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
2021-05-10T10:06:59.726023Z 1 [Note] WSREP: Setting wsrep_ready to false
2021-05-10T10:06:59.726118Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.101.0.64' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '5562' --mysqld-version '5.7.33-36-57'   '' )
	2021-05-10T10:07:00.187506Z WSREP_SST: [INFO] Streaming with xbstream
2021-05-10T10:07:00.507815Z 1 [Note] WSREP: Prepared SST/IST request: xtrabackup-v2|10.101.0.64:4444/xtrabackup_sst//1
2021-05-10T10:07:00.507851Z 1 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 1 -> 1) (Increment: 1 -> 2)
2021-05-10T10:07:00.507863Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-05-10T10:07:00.507901Z 1 [Note] WSREP: Assign initial position for certification: 8, protocol version: 4
2021-05-10T10:07:00.507939Z 0 [Note] WSREP: Service thread queue flushed.
2021-05-10T10:07:00.507956Z 1 [Note] WSREP: Check if state gap can be serviced using IST
2021-05-10T10:07:00.507971Z 1 [Note] WSREP: Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: f074dcea-9adb-11eb-a592-46c651bdce73
2021-05-10T10:07:00.508016Z 1 [Note] WSREP: State gap can't be serviced using IST. Switching to SST
2021-05-10T10:07:00.508024Z 1 [Note] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (f074dcea-9adb-11eb-a592-46c651bdce73): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():516. IST will be unavailable.
2021-05-10T10:07:00.508766Z 0 [Note] WSREP: Member 0.0 (test-percona-perconaxtradb-vm-1) requested state transfer from '*any*'. Selected 1.0 (test-percona-perconaxtradb-vm-0)(SYNCED) as donor.
2021-05-10T10:07:00.508784Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8)
2021-05-10T10:07:00.508833Z 1 [Note] WSREP: Requesting state transfer: success, donor: 1
2021-05-10T10:07:00.508852Z 1 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> f074dcea-9adb-11eb-a592-46c651bdce73:8
	2021-05-10T10:07:01.292094Z WSREP_SST: [INFO] Streaming with xbstream
	2021-05-10T10:07:01.302486Z WSREP_SST: [INFO] Proceeding with SST.........
	2021-05-10T10:07:01.317911Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2021-05-10T10:07:02.224884Z 0 [Note] WSREP: (7500ee6f, 'tcp://0.0.0.0:4567') turning message relay requesting off
	2021-05-10T10:07:13.813079Z WSREP_SST: [ERROR] ******************* FATAL ERROR ********************** 
	2021-05-10T10:07:13.814324Z WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log
	2021-05-10T10:07:13.815563Z WSREP_SST: [ERROR] ****************************************************** 
	2021-05-10T10:07:13.816979Z WSREP_SST: [ERROR] Cleanup after exit with status:2
2021-05-10T10:07:13.824639Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.101.0.64' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '5562' --mysqld-version '5.7.33-36-57'   '' : 2 (No such file or directory)
2021-05-10T10:07:13.824673Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2021-05-10T10:07:13.824680Z 0 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2021-05-10T10:07:13.824715Z 0 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2021-05-10T10:07:13.824725Z 0 [ERROR] Aborting

I saw some error is about the UUID?
And this one

WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log

where I can find the DONOR.log??

matthewb · May 11, 2021, 2:17am

It’s not telling you to look at DONOR.log, it’s telling you to look at the log file on the DONOR. So go to node2 and look in mysql’s data directory. There should be some logs in there for mysql and for the xtrabackup process.

Russel_HSU · June 2, 2021, 6:49am

Hello. Practice these days. But often failed. I have one more question.
When I have new cluster and stop all nodes before backup.
Should I delete all mysql data on all nodes??
So my step:

xtrabackup --backup --target-dir=/data/backups/
rsync to server
xtrabackup --prepare
stop new cluster mysql all nodes
rm -r /data/mysql/*
copy-back data
chown mysql
start mysql

I wanna ask , after stop all mysql, should I delete all mysql data on all nodes? Or just the single prepare to copy back server?

Russel_HSU · June 2, 2021, 7:36am

When I follow these steps
The first node have copy-data and I start on first nodes

/etc/init.d/mysql bootstrap-pxc

and check the mysql status it still stop, so I start the mysql

service mysql start

then I start node2 and node 3. They are connected I checked by

mysql > show status like ‘wsrep%’;

SO now first node have copy-data and node 2 & 3 no data and all mysql are startted

Question: How can I know The SST is doing ??

matthewb · June 3, 2021, 6:55pm

You can watch mysql’s error log to see the SST process. There’s also an xtrabackup log file in the datadir while the SST is happening.

rekordskratch · April 22, 2023, 2:37pm

My scenario is similar to OP but I’m having trouble restoring an xtrabackup to a freshly setup cluster using this guide. So far, no databases, everything default and as per the configs from the how-to I followed.

The backup was taken from an older environment running xtradb cluster 5.7 using xtrabackup 2.4. I streamed the backup file to a node in the new cluster and extracted it there. I installed xtrabackup 2.4 on the new node as I had trouble running the prepare phase using 8.0

First I stop mysql on all the nodes. After extracting and preparing, I clear the mysql data directory and copy over the prepared backup files. At this point node 1 only contains the backup files.

I then bootstrap this node and it fails. Is there a guide somewhere I haven’t come across in the last year of trying to get this right?

root@db-node-01:~# service mysql@bootstrap.service start
Job for mysql@bootstrap.service.service failed because of unavailable resources or another system error.
See "systemctl status mysql@bootstrap.service.service" and "journalctl -xe" for details.
root@db-node-01:~# journalctl -xe
Apr 22 15:40:07 db-node-01 systemd[1]: mysql@bootstrap.service.service: Failed to load environment files: No such file or directory
Apr 22 15:40:07 db-node-01 systemd[1]: mysql@bootstrap.service.service: Failed to run 'start-pre' task: No such file or directory
Apr 22 15:40:07 db-node-01 systemd[1]: mysql@bootstrap.service.service: Failed with result 'resources'.

Looking at the log files, I see errors about the SSL certificates that were generated when I initially installed pxc. Luckily I saved those, so I moved them back and tried again, no luck.

I just wish that there was a simple guide to follow. I would have thought this is a very common usage scenario ie backup on old 5.7 cluster and restore on new 8.0 cluster but alas I cannot find anything

Topic		Replies	Views
Error with xtrabackup Percona XtraDB Cluster 5.x	9	4047	January 2, 2015
Node Shutdown after start Percona XtraDB Cluster 5.x	12	3813	August 4, 2015
Wsrep_sst_method=xtrabackup (-v2) not working until use rsync first Percona XtraBackup mysql , percona	5	2122	January 25, 2021
Xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe) xb_stream_write_data() failed Percona XtraDB Cluster 8.x	17	1489	January 23, 2024
Can't connect to cluster after wsrep error - xtrabackup_checkpoints missing Percona XtraDB Cluster 8.x	4	4303	October 13, 2022

Restore percona xtradb cluster by XtraBackup on GCP. Got some problem

Related topics