PXC backup and restore workflow

Dear Percona experts,

I am looking into Galera Cluster backup and restore procedures and trying to better understand the workflow.

I have reviewed the documentation on creating and preparing full backups here:

However, I have a few specific questions about how this works in practice:

I. Which node should be used for backups? Is it necessary to run the backup script on all nodes, or is backing up just one node sufficient? Should it be a writer node?

II. If I run the restore command on one node (after preparing the backup), how is that data synchronized to the rest of the cluster? I assume the node where I ran the restore has to copy this data to the remaining nodes. Does this happen via IST or SST? Since it can be a huge amount of data, I want to understand the mechanism.

III. In a scenario where the whole cluster has crashed and I need to bootstrap: Is it possible to perform backup with prepare command and than restore command and after then join other nodes to “new” restored cluster? Is restoring from a backup treated differently than a standard bootstrap after a crash?

Could you simply explain the data flow when a restore is executed? From which node to which node does the data move?

Your guidance would be much appreciated!

Technically does not matter as all nodes have the same data. One node is sufficient.

Ensure that you’ve recreated the grastate.dat file from the backup metadata. This will contain the UUID and sequence number of the backup. When you start mysql, an IST will take place to catch-up this node.

Yes, in the event all nodes are crashed and unrecoverable, perform a backup restore and bootstrap the first node. Once online, start node2 and wait for SST to complete. Then start node3.

No

Data moves from DONOR to JOINER; The JOINER is the node which is recovering/joining the cluster. The DONOR can be any other online node which is “donating” the dataset.

1 Like

Thank you so much @matthewb. Your post helped me better understand the topic.

However, I would like to ask a little more about the restore options:

Ensure that you’ve recreated the grastate.dat file from the backup metadata. This will contain the UUID and sequence number of the backup. When you start mysql, an IST will take place to catch-up this node.

Does this mean that in order to restore the node, it is necessary to manually create grastate.dat? If so, I would recommend updating the Percona documentation with this step, because without it, the restore procedure fails (a full SST takes place, so the backup is effectively not used). Correct me if I am mistaken.


In general, there are at least 5 types of restore scenarios:

A: One node restore - After executing the restore command, then mysql should be started, the node joins the cluster, and IST takes place.

B: Whole cluster restore (Disaster Recovery) - One node is restored and bootstrapped, and the remaining nodes join the cluster via SST.

C: Sandbox restore - The node is restored on a single detached instance that is not part of the cluster (used to verify if backed-up data is usable).

D: Partial recovery - E.g., if some databases were removed, it is possible to restore the node offline (Option C) and import/write the needed data back to the cluster as a new database or table.

E: Individual table restore (as per Percona documentation).

Is this summary correct?

A: Yes
B: 90% yes. If done correctly, the other nodes should IST, not SST. This is why it is important to bootstrap the node with the highest seqno
C: Yes
D: Yes, using an ETL-style process, or logical backup tool to dump/load the data
E: Only with a logical dump/load. Using TRANSPORTABLE TABLESPACES in PXC is recipe for disaster.

1 Like