Looking at an inherited backup protocol that seems to not be working as intended, and would simply like to confirm what it looks like is happening. We have a script that is a wrapper for xtrabackup. At one point, it seems that the target_dir for this was set to /var/lib/mysql/backups/base and some backups were completed, or copied into, this location. Now, the backups are failing with tablespace ID errors similar to the following:
If one looks in these paths, those files do exist, and because they are apparently xtrabackup sets within /var/lib/mysql it looks like when xtrabackup goes to run, it sees these directories as databases it needs to back up on top of the other databases that are in /var/lib/mysql - which of course are duplicate table id’s because it’s an older copy of the same data.
Is this how xtrabackup works? Is it safe to delete or move the now months old backup files? Thanks for any advice, happy to answer with any other details that might be requested.
If one looks in these paths, those files do exist, and because they are apparently xtrabackup sets within /var/lib/mysql it looks like when xtrabackup goes to run, it sees these directories as databases it needs to back up on top of the other databases that are in /var/lib/mysql - which of course are duplicate table id’s because it’s an older copy of the same data.
Here, you are saying backup was taken into MySQL data directory (by mistake), which is problmatic. Any directory in MySQL data-dir will be considered as a database.
For this, I would suggest the following,
Take a logical(mysqldump) backup of all databases. (this backup will be useful in case MySQL refuses to start later due to unwanted files/directories in data directory)
Shutdown mysql and take a cold backup of data-dir by copying it as a backup.
After this, you can remove unwanted files/directories that were created by xtrabackup in the MySQL data directory. This is a crucial step to make sure you Identify the correct files/ directories for removal.
And Start MySQL service again.
For backups: --target-dir should be empty for new backup. As you mentioned you are using script, I would suggest adding logic to the script to create a backup target-dir with the current DateTime like below,
BACKUP_DIR="/backups/mysql/full"
#Get current date for filename
TODAY=$(date +%Y-%m-%d)
FULL_BACKUP_FILENAME="${BACKUP_DIR}/${TODAY}"
Is this how xtrabackup works? Is it safe to delete or move the now months old backup files? Thanks for any advice, happy to answer with any other details that might be requested.
Yes, from the MySQL data directory, but as mentioned above, after taking a logical(mysqldump) backup and data dir backup.
Thank you this helps very much and your approach to fixing it is careful and considered, which I appreciate.
I can confirm that the existent script originally had the target_dir set to /var/lib/mysql/backups/base and all of the files causing the errors are within that directory. It does have the logic in it to put backups into date labeled new empty directories, so that’s a positive thing that won’t need fixing.
This cluster is part of a xtradb cluster, does that change the procedure at all?
Are all PXC nodes affected by this? Case1: If only one node has an issue where backup files are mixed in the data directory, then you can rebuild that node from another node by doing SST / rebuilding it from other nodes. Before that, It is very important to check if the other PXC nodes have these backup files in the data directory or not.
If yes, xtrabackup(SST) will fail with the same issue.
If No, where other nodes are unaffected, simply doing SST from them would fix this issue.
Case2: If all nodes are affected as well by having backup files in the data directory and this is a PXC cluster, then you can fix one node at a time using the below steps,
Make sure all nodes are part of the cluster and in sync. This is important because other running nodes can be used later for restoration if something goes wrong.
Shutdown MySQL.
After this, you can remove unwanted files/directories that were created by xtrabackup in the MySQL data directory. This is a crucial step to make sure you Identify the correct files/ directories for removal.
And Start MySQL service again.
Verify the cluster status, making sure the node joins the cluster and is in a healthy state.
Fix issue on other nodes the same way (one at a time).
The benefit of having a PXC cluster is that all nodes are primary, and if anything goes wrong on one node, we can use other nodes as backup for recovery.