Innobackupex - xtrabackup_logfile got the error: Errcode: 116 - Stale file handle at the last step

kart · June 16, 2024, 10:20am

Hi everyone,

We’re encountering an issue with our database backups, which can take several days due to the large size (10TB to 40TB). The backups are written to an NFS share. Unfortunately, at the very end of the process, when writing information to the xtrabackup_logfile, we’re getting a “Stale file handle” error (Errcode: 116).

This error renders the entire backup unusable, and retries haven’t been successful. We suspect the issue might be related to the xtrabackup_logfile’s file handle becoming outdated during the lengthy backup process.

Here’s our question:

Is there a way to handle the “stale file handle” error during xtrabackup backups? Could we potentially retry with a different approach instead of relying on the original file handle created at the beginning (which can be as old as 48 hours)?

If anyone has encountered this error or has suggestions on how to avoid it, your input would be greatly appreciated!

matthewb · June 16, 2024, 3:02pm

Hello @kart,
You are not the first one who has posted on the forums about backup to NFS having stale files. This seems to be a common issue with NFS-based backups.

I would recommend that you take advantage of multiple PXB streams to backup your database much faster than “several days”.

With this approach, the files are held in memory and streamed as data is written rather than holding open a file descriptor which can go stale.

For a database of that size, the better-practice is to use filesystem snapshots like ZFS, btrfs, or EBS (ie: cloud snaps) as you can take the snapshot in under 10s and then transfer the snapshot to another system in the background.

kart · June 17, 2024, 8:39am

Thank you @matthewb for the quick reply.

I will start exploring the multiple PXB Streams.

By the way, could you please elaborate this FS snapshots. Can I restore the same as I do with Xtrabackupex? Please advise or any materials to check. I will also google about it, thank you.

matthewb · June 17, 2024, 7:35pm

Check which filesystems support snapshots. In session 1, run FLUSH TABLES WITH READ LOCK, then in session 2, run the snapshot command (eg: aws ebs snapshot /path/volume), after that returns, in session 1, exit session to unlock everything. Snapshot done in < 30s.

To restore, create a new server, restore the snapshot, start mysql.

Topic		Replies	Views
Xtrabackup failed with error xtrabackup_logfile' (Errcode: 9 - Bad file descriptor) Percona XtraBackup	3	2065	May 22, 2018
xtrabackup endless loop of scanned up to log Percona XtraBackup	4	3861	November 30, 2013
Xtrabackup cannot keep up with binlogs Percona XtraBackup	3	2135	August 15, 2013
xtrabackup file too large error Percona XtraBackup	1	1190	January 15, 2018
XtraBackup fails instantly Percona XtraBackup	3	763	January 9, 2015

Innobackupex - xtrabackup_logfile got the error: Errcode: 116 - Stale file handle at the last step

Related topics