I have an old Percona Xtradb cluster with 3 nodes by GCP deploy automatically. And we also deploy new Percona Xtradb cluster with 3 nodes by GCP deploy. Then I try to backup data from old cluster to new one. My process is like this
Then start mysql, The first mode is working now. But another two node didnât sync. So I try rsync to another node But second node after import data. mysql start is failed.
I also try before rsync data, stop mysql first. But it is the same. Always first node is successful. but second and third failed.
I have some question. When I rsync data to new cluster. Should I rsync all node? And before rsync should I stop mysql on new node? I also follow official document. use --prepare and --copy-back or innobackupex --apply-log . They look like similar. And I canât restore data and sync them.
When you start node 2 and 3, they should automatically SST from node 1. If this is failing, you need to troubleshoot this. Look at node 1 error log for any possible issues.
Once you have one node running for PXC at the restored location, you will be better served to let PXC SST bootstrap the other nodes for you as it will take backups automatically and restore them for you.
If youâve followed these steps and are still experiencing issues, can you show us the logs from your attempts to bootstrap node2 and node3 to the cluster?
In response to your questions - you can safely take a Percona XtraBackup backup from a running PXC instance. But when performing the --prepare and later operations on the instance to which you are restoring, you should not have a running mysqld otherwise you will see corruption and other issues.
Thank your helping. I wanna check some detail and try again. From your reply, my understanding is when I restore, should not have a running mysqld. So before --prepare on node1, I should stop 3 nodes mysql, and --prepare on node 1. Then copy-back or move-back the data and start node 1 mysql. If successful. I can start node 2 and node 3. They should automatically SST from node 1. And finish. Right? And If fail, I will check node 2 and node 3 logs. Above correct?
Thank you for your helping. I wanna ask, when I start node 2 and node 3, because my data is 19GB. Should I wait it SST? Iâm not sure the SST principle. If I just restore from node 1. Then fast start on node 2 and node 3. Should I wait?
SST = state snapshot transfer. the SST process uses xtrabackup to take a backup and stream it to the joining node. So restoring a backup or using SST accomplishes the same thing and should take about the same amount of total time. The SST process is recommended because it is handled automatically.
Hello, I tried restore again. This time before I ââprepareâ, I had closed 3 nodes. Then --prepare and --copy-back on node1. It is the same. After copy data and chown mysql. Node 1 is successful to start mysql. And I start Node 2. It is failed. I check logs on node 2. Can you help me?
2021-05-10T02:54:50.702855Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.33-36-57) starting as process 5548 ...
2021-05-10T02:54:50.707897Z 0 [Note] WSREP: Setting wsrep_ready to false
2021-05-10T02:54:50.707922Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2021-05-10T02:54:50.707929Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2021-05-10T02:54:50.710275Z 0 [Note] WSREP: wsrep_load(): Galera 3.49(r0ef0d79) by Codership Oy <info@codership.com> loaded successfully.
2021-05-10T02:54:50.710363Z 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-05-10T02:54:50.710923Z 0 [Note] WSREP: Found saved state: f074dcea-9adb-11eb-a592-46c651bdce73:8, safe_to_bootstrap: 0
2021-05-10T02:54:50.728798Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.101.0.61; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.freeze_purge_at_seqno = -1; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 2G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 128; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 9; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto;
2021-05-10T02:54:50.764128Z 0 [Note] WSREP: GCache history reset: f074dcea-9adb-11eb-a592-46c651bdce73:0 -> f074dcea-9adb-11eb-a592-46c651bdce73:8
2021-05-10T02:54:50.765806Z 0 [Note] WSREP: Assign initial position for certification: 8, protocol version: -1
2021-05-10T02:54:50.765934Z 0 [Note] WSREP: Preparing to initiate SST/IST
2021-05-10T02:54:50.765952Z 0 [Note] WSREP: Starting replication
2021-05-10T02:54:50.765977Z 0 [Note] WSREP: Setting initial position to f074dcea-9adb-11eb-a592-46c651bdce73:8
2021-05-10T02:54:50.766254Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2021-05-10T02:54:50.766390Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2021-05-10T02:54:50.766540Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2021-05-10T02:54:50.766550Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2021-05-10T02:54:50.766804Z 0 [Note] WSREP: GMCast version 0
2021-05-10T02:54:50.778574Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2021-05-10T02:54:50.778624Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2021-05-10T02:54:50.779105Z 0 [Note] WSREP: EVS version 0
2021-05-10T02:54:50.779222Z 0 [Note] WSREP: gcomm: connecting to group 'test', peer 'test-percona-perconaxtradb-vm-0:,test-percona-perconaxtradb-vm-1:,test-percona-perconaxtradb-vm-2:'
2021-05-10T02:54:50.779864Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.101.0.61:4567
2021-05-10T02:54:50.785442Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') connection established to 55248078 tcp://10.101.0.62:4567
2021-05-10T02:54:50.785551Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2021-05-10T02:54:51.036765Z 0 [Note] WSREP: declaring 55248078 at tcp://10.101.0.62:4567 stable
2021-05-10T02:54:51.037502Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,16bdf161,6)
memb {
16bdf161,0
55248078,0
}
joined {
}
left {
}
partitioned {
6263777e,0
66f31338,0
}
)
2021-05-10T02:54:54.281022Z 0 [Note] WSREP: (16bdf161, 'tcp://0.0.0.0:4567') turning message relay requesting off
2021-05-10T02:55:21.290606Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():161
2021-05-10T02:55:21.290657Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2021-05-10T02:55:21.290817Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1527: Failed to open channel 'test' at 'gcomm://test-percona-perconaxtradb-vm-0,test-percona-perconaxtradb-vm-1,test-percona-perconaxtradb-vm-2': -110 (Connection timed out)
2021-05-10T02:55:21.290832Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2021-05-10T02:55:21.290841Z 0 [ERROR] WSREP: Provider/Node (gcomm://test-percona-perconaxtradb-vm-0,test-percona-perconaxtradb-vm-1,test-percona-perconaxtradb-vm-2) failed to establish connection with cluster (reason: 7)
2021-05-10T02:55:21.290846Z 0 [ERROR] Aborting
2021-05-10T02:55:21.290867Z 0 [Note] Giving 0 client threads a chance to die gracefully
2021-05-10T02:55:21.290875Z 0 [Note] WSREP: Waiting for active wsrep applier to exit
2021-05-10T02:55:21.290879Z 0 [Note] WSREP: Service disconnected.
2021-05-10T02:55:21.290882Z 0 [Note] WSREP: Waiting to close threads......
2021-05-10T02:55:26.291127Z 0 [Note] WSREP: Some threads may fail to exit.
2021-05-10T02:55:26.302798Z 0 [Note] WSREP: Service thread queue flushed.
2021-05-10T02:55:26.302932Z 0 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0, misses: 0, in use: 0, in pool: 0
2021-05-10T02:55:26.303022Z 0 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 0)
2021-05-10T02:55:26.306561Z 0 [Note] Binlog end
2021-05-10T02:55:26.306721Z 0 [Note] /usr/sbin/mysqld: Shutdown complete
I saw it try SST, but show the ERROR WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory).
Is it the main problem? How can I resolve it?
When node1 is running, go to node2 and âtelnet node1 4567â Do you get a response? or does this timeout? If it times out, you have a firewall/network issue.
root@test-percona-perconaxtradb-vm-1:~# nmap 10.101.0.66 -p 4567
Starting Nmap 7.40 ( https://nmap.org ) at 2021-05-11 01:50 UTC
Nmap scan report for test-percona-perconaxtradb-vm-0.c.smeeth-app.internal (10.101.0.66)
Host is up (0.0016s latency).
PORT STATE SERVICE
4567/tcp open tram
Nmap done: 1 IP address (1 host up) scanned in 0.54 seconds
And I had tried again. Here have some logs.
2021-05-10T10:06:59.725982Z 1 [Note] WSREP: State transfer required:
Group state: f074dcea-9adb-11eb-a592-46c651bdce73:8
Local state: 00000000-0000-0000-0000-000000000000:-1
2021-05-10T10:06:59.725998Z 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2021-05-10T10:06:59.726002Z 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2021-05-10T10:06:59.726013Z 1 [Note] WSREP: New cluster view: global state: f074dcea-9adb-11eb-a592-46c651bdce73:8, view# 5: Primary, number of nodes: 2, my index: 0, protocol version 3
2021-05-10T10:06:59.726016Z 1 [Note] WSREP: Setting wsrep_ready to true
2021-05-10T10:06:59.726020Z 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
2021-05-10T10:06:59.726023Z 1 [Note] WSREP: Setting wsrep_ready to false
2021-05-10T10:06:59.726118Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.101.0.64' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '5562' --mysqld-version '5.7.33-36-57' '' )
2021-05-10T10:07:00.187506Z WSREP_SST: [INFO] Streaming with xbstream
2021-05-10T10:07:00.507815Z 1 [Note] WSREP: Prepared SST/IST request: xtrabackup-v2|10.101.0.64:4444/xtrabackup_sst//1
2021-05-10T10:07:00.507851Z 1 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 1 -> 1) (Increment: 1 -> 2)
2021-05-10T10:07:00.507863Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-05-10T10:07:00.507901Z 1 [Note] WSREP: Assign initial position for certification: 8, protocol version: 4
2021-05-10T10:07:00.507939Z 0 [Note] WSREP: Service thread queue flushed.
2021-05-10T10:07:00.507956Z 1 [Note] WSREP: Check if state gap can be serviced using IST
2021-05-10T10:07:00.507971Z 1 [Note] WSREP: Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: f074dcea-9adb-11eb-a592-46c651bdce73
2021-05-10T10:07:00.508016Z 1 [Note] WSREP: State gap can't be serviced using IST. Switching to SST
2021-05-10T10:07:00.508024Z 1 [Note] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (f074dcea-9adb-11eb-a592-46c651bdce73): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():516. IST will be unavailable.
2021-05-10T10:07:00.508766Z 0 [Note] WSREP: Member 0.0 (test-percona-perconaxtradb-vm-1) requested state transfer from '*any*'. Selected 1.0 (test-percona-perconaxtradb-vm-0)(SYNCED) as donor.
2021-05-10T10:07:00.508784Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8)
2021-05-10T10:07:00.508833Z 1 [Note] WSREP: Requesting state transfer: success, donor: 1
2021-05-10T10:07:00.508852Z 1 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> f074dcea-9adb-11eb-a592-46c651bdce73:8
2021-05-10T10:07:01.292094Z WSREP_SST: [INFO] Streaming with xbstream
2021-05-10T10:07:01.302486Z WSREP_SST: [INFO] Proceeding with SST.........
2021-05-10T10:07:01.317911Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2021-05-10T10:07:02.224884Z 0 [Note] WSREP: (7500ee6f, 'tcp://0.0.0.0:4567') turning message relay requesting off
2021-05-10T10:07:13.813079Z WSREP_SST: [ERROR] ******************* FATAL ERROR **********************
2021-05-10T10:07:13.814324Z WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log
2021-05-10T10:07:13.815563Z WSREP_SST: [ERROR] ******************************************************
2021-05-10T10:07:13.816979Z WSREP_SST: [ERROR] Cleanup after exit with status:2
2021-05-10T10:07:13.824639Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.101.0.64' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '5562' --mysqld-version '5.7.33-36-57' '' : 2 (No such file or directory)
2021-05-10T10:07:13.824673Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2021-05-10T10:07:13.824680Z 0 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2021-05-10T10:07:13.824715Z 0 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2021-05-10T10:07:13.824725Z 0 [ERROR] Aborting
Itâs not telling you to look at DONOR.log, itâs telling you to look at the log file on the DONOR. So go to node2 and look in mysqlâs data directory. There should be some logs in there for mysql and for the xtrabackup process.
Hello. Practice these days. But often failed. I have one more question.
When I have new cluster and stop all nodes before backup.
Should I delete all mysql data on all nodes??
So my step:
xtrabackup --backup --target-dir=/data/backups/
rsync to server
xtrabackup --prepare
stop new cluster mysql all nodes
rm -r /data/mysql/*
copy-back data
chown mysql
start mysql
I wanna ask , after stop all mysql, should I delete all mysql data on all nodes? Or just the single prepare to copy back server?
My scenario is similar to OP but Iâm having trouble restoring an xtrabackup to a freshly setup cluster using this guide. So far, no databases, everything default and as per the configs from the how-to I followed.
The backup was taken from an older environment running xtradb cluster 5.7 using xtrabackup 2.4. I streamed the backup file to a node in the new cluster and extracted it there. I installed xtrabackup 2.4 on the new node as I had trouble running the prepare phase using 8.0
First I stop mysql on all the nodes. After extracting and preparing, I clear the mysql data directory and copy over the prepared backup files. At this point node 1 only contains the backup files.
I then bootstrap this node and it fails. Is there a guide somewhere I havenât come across in the last year of trying to get this right?
root@db-node-01:~# service mysql@bootstrap.service start
Job for mysql@bootstrap.service.service failed because of unavailable resources or another system error.
See "systemctl status mysql@bootstrap.service.service" and "journalctl -xe" for details.
root@db-node-01:~# journalctl -xe
Apr 22 15:40:07 db-node-01 systemd[1]: mysql@bootstrap.service.service: Failed to load environment files: No such file or directory
Apr 22 15:40:07 db-node-01 systemd[1]: mysql@bootstrap.service.service: Failed to run 'start-pre' task: No such file or directory
Apr 22 15:40:07 db-node-01 systemd[1]: mysql@bootstrap.service.service: Failed with result 'resources'.
Looking at the log files, I see errors about the SSL certificates that were generated when I initially installed pxc. Luckily I saved those, so I moved them back and tried again, no luck.
I just wish that there was a simple guide to follow. I would have thought this is a very common usage scenario ie backup on old 5.7 cluster and restore on new 8.0 cluster but alas I cannot find anything