Sudden Failure of Automated Backups


Automated Xtrabackups for our MySQL Percona Cluster suddenly seem to have stopped working without us changing anything specific in the backup procedures/normal operating processes of our cluster. In the error log, there are entries like this:

2017-02-13 02:00:02,007 - INFO - Pre-process for backup: Ensure the backup directory is present and empty…
2017-02-13 02:00:02,023 - INFO - Running innobackupex for the running instance…
2017-02-13 02:00:02,025 - INFO - … xtrabackup still running, check [/home/percona/backup.log/backup.log]
^Gxbstream: Can’t create/write to file ‘././backup-my.cnf’ (Errcode: 17 - File exists)
xbstream: failed to create file.
2017-02-13 02:00:09,033 - ERROR - Caught this exception in Manager: Xtrabackup failed - please see log at [/home/percona/backup.log/backup.log]
2017-02-13 02:00:09,033 - INFO - Clean up any subprocesses before exit…

The backup.log states:

170213 02:00:02 innobackupex: Starting ibbackup with command: xtrabackup --defaults-extra-file="/home/percona/.my.cnf" --defaults-group=“mysqld” --backup --suspend-at-end --target-dir=/tmp --innodb_data_file_path=“ibdata1:12M:autoextend” --tmpdir=/tmp --extra-lsndir=’/tmp’ --stream=xbstream

innobackupex: Waiting for ibbackup (pid=5292) to suspend
innobackupex: Suspend file ‘/tmp/xtrabackup_suspended_2’

xtrabackup version 2.2.12 based on MySQL server 5.6.24 Linux (x86_64) (revision id: 8726828)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /var/lib/mysql
xtrabackup: open files limit requested 65535, set to 1024
xtrabackup: using the following InnoDB configuration:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 2
xtrabackup: innodb_log_file_size = 4294967296
xtrabackup: using ALL_O_DIRECT

log scanned up to (12969935228387)
xtrabackup: Generating a list of tablespaces
[01] Streaming ./ibdata1
^Gxtrabackup: Error writing file ‘UNOPENED’ (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
^Gxtrabackup: Error writing file ‘UNOPENED’ (Errcode: 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.

innobackupex: got a fatal error with the following stacktrace: at /usr/bin/innobackupex line 2711
main::wait_for_ibbackup_file_create(’/tmp/xtrabackup_suspended_2’) called at /usr/bin/innobackupex line 2731
main::wait_for_ibbackup_suspend(’/tmp/xtrabackup_suspended_2’) called at /usr/bin/innobackupex line 1984
main::backup() called at /usr/bin/innobackupex line 1609
innobackupex: Error: The xtrabackup child process has died at /usr/bin/innobackupex line 2711.

The only thing I can think of that would have changed over time is the size of the database backups, as of course the database is always growing. Could the backup buffer sizes need to be increased? More information: our backups stream daily from the secondary node of our 3 node cluster to another server where the backups are kept and then a test restore is done from this backup server to another final server instance to verify the backups restore correctly. Furthermore, our binlogs are constantly also streamed from this secondary node of the 3 node cluster the final restore server instance (the binlogs have not failed and continue to stream properly according to files present on the final restore server).

Topology Summary:
SqlCluster Node1/2/3: Node 2 Streams Backups Daily to -> Backup Keeper Server which restores to -> Final Restore server
Node 2 also Streams constant binlogs to -> Final Restore server

Any suggestions on what might be causing this sudden backup failure? The cluster has been running without issue otherwise so it is perplexing. There is also plenty of free space given the amount of data we are backing up/testing restores on. Thanks in advance and have a great day! :slight_smile: