Is there a way for force a refresh-style full backup?

Let’s say I do a full backup of a 300GB database to /my/base

Then I do 2 incrementals to inc1 and inc2.

After I prepare the base and apply the 2 incrementals, now I want to do another full backup to /my/base/ .

Does the backup have to copy the whole 300GB database again, or is there some rsync-style behavior that only copies pages or blocks that are necessary? I’m aware of the --rsync flag, but I don’t know if that applies to my question.

We have about 20 TB of database and we don’t want to copy the whole thing to backup storage every time we want to do a full backup, but we have noticed that xtrabackup appears to want an empty target directory.

Same question for restores. Does the restore folder have to be empty, of can xtrabackup be forced to simply update existing files?

1 Like

A full backup is exactly that, a full backup. It cannot be applied/merged to anything existing. Each time you want a full backup it will copy the entire 300GB, thus is the very definition of full backup.

The rsync flag you mention is only used for non-InnoDB files.

You should be able to use /my/base as the starting point for another incremental and just keep repeating this process of ‘take incremental, prepare/merge to previous base’

  • Take a full to /my/base
  • Take inc1 to /my/inc1 (using /my/base as --incremental-basedir)
  • Take inc2 to /my/inc2 (using /my/inc1 as --incremental-basedir)
  • Apply/merge inc1 to /my/base
  • Apply/merge inc2 to /my/base

Taking a “full backup” into /my/base at this point requires removing all data and taking a full backup. Or, you can simply take another incremental using /my/base as --incremental-basedir which does what you want (only copy pages necessary) and store into /my/inc3

Restores cannot be done incrementally.

1 Like

In my view, a full backup is when, at the conclusion of the operation, the destination is an exact duplicate of the source, on separate media. We accomplish that now as follows…

  1. Do flush tables with read lock and flush logs on the database.
  2. Take an LVM snapshot.
  3. Release the read lock
  4. Use rsync to copy the snapshot to the backup server

Steps 1-3 take one or two seconds total.

Step 4 depends on how much data has been changed. If the database is 300GB, but only 1 GB of changes took place today, then only 1 GB is transferred to the destination.

At the end of step 4, the backup server has an exact byte-for-byte, confirmed 100% perfect duplicate of the source database, but only a fraction of the data had to be copied. This produces the effect of a full backup with the speed of an incremental (though not anywhere near as fast an xtrabackup’s incrementals, because it can reference the bitmap file, whereas rsync has to scan and compare every block of the database). All of this happens using the same destination folder. (We can then produce daily versioned backups using a second stage of the process using rdiff-backup to create reverse differentials for however many days we feel is appropriate, but all that happens on the backup server itself with no involvement of the DB server.)

I was hoping xtrabackup had similar functionality

1 Like

xtrabackup accomplishes backups in a different way than what you are describing. But as I said above, you can still accomplish what you are after using xtrabackup by implementing “rolling incrementals” as I described. Thus, if only 1 GB of changes takes place, then the incremental for that day would only be 1GB. After taking the incremental, merge it with the previous incremental. You now have a full backup at the ready for restore.

1 Like

Very true, but I think you run into attendant problems. First, if you do daily incrementals for a week, then to restore you have to do the base + 6 incrementals. More time, more moving parts, more room for human or machine error. It becomes a balancing act. How long do you go between each full backup? If your databases are huge, doing them once a week could be a problem, but the longer you wait between, the more incrementals you must apply. It becomes game of, how many sumo wrestlers can I fit into this phone booth before it breaks? Plus you inevitably get that question on audit questionnaires from customers: “How often do you do full backups?” The further apart they are, the more suspiciously potential customers look at you. With the method I described, we can truthfully tell customers we do full backups every night.

1 Like

In what I described above, you would take the incremental and then immediately merge it, so no, you would not have to do all that. Restore would be 4 steps: final prepare, restore/copy-back, permissions, start mysql.

1 Like

But then the next time you do a full backup, it must be to an empty directory and would copy the entire database. If your databases are big, and you have a lot of them (we have about 500, many of which are in the 100-400GB range) that’s a lot.

1 Like

The whole point of this post, your goal, was to avoid taking actual full backups so why are you throwing this back as a negative? Take a full backup (empty dir) on the 1st, then each day take an incremental and immediately merge it. You now have a rolling full daily backup ready for immediate restore that only copies the days changes. On the next month, the 1st, take a new full and start over.

1 Like

The goal is to avoid taking full backups where every byte is copied from the source to the destination. An rsync-based full backup only copies the changed bytes, but you still end up with the same result. We do want full backups, just not the kind you’re thinking of.

1 Like

So does an incremental backup. Both solutions arrive at the same goal:

  • lvm snapshot, rsync: only copies bytes that changed since last snapshot (1GB), end result is “full backup” (300GB)
  • xtrabackup incremental: only copies bytes that changed since last incremental (1GB), merge with previous incremental, end result is a “full backup” (300GB)

Both solutions result in a “full backup” taken each day which only copy the changed bytes.

1 Like

Boy, I am misunderstanding something terribly. At the end of week one, you have…

base
inc1
inc2
inc3
inc4
inc5
inc6

Then you do ‘prepare’ 7 times, and now you have a merged full backup using only the changed bytes. I got that. So then what? Remove folders inc1-6, and start over with base and start creating new incrementals? Because that would be okay.

1 Like

That’s not what I’ve been describing.

base (Sunday)
inc1, (Monday) merge with base, base is now a “full backup” with Monday’s changes, erase inc1
inc1, (Tuesday) merge with base, base is now a “full backup” with Monday and Tues changes, erase inc1
inc1, (Wednesday), merge with base, base is now a “full backup” with Mon, Tue, and Wed changes, erase inc1
…etc

Each day you take an incremental, relative to the previous day (which is “base” because you are merging each day’s changes back into it) which only copies what changed. You immediately merge it which results in a “full backup” taken every day, only copying the changes. You can do this forever. You don’t ever need to take another “base” unless you want to.

What you described also works and accomplishes the same thing, but you said above that that was too many steps. My solution simplifies the steps by merging/preparing immediately after taking the incremental.

1 Like

That makes sense. I’ll give that a shot. Sorry if I made this harder than it needed to be.I appreciate your patience!

1 Like

I tested this out and got it working. It’s a little scary for me to base my DR strategy on one full backup and a long string of merged incrementals. If we do full backups once a month or something like that, it means we take an incremental and merge it daily for 30 consecutive days. I don’t know how bullet-proof xtrabackup is over time like that. It seems like it makes room for little glitches to creep in. Byte-level full backups are more confidence inspiring to me, but also much more time consuming. It’s a conundrum.

1 Like