Best way to take AWS Aurora MySQL backups using Xtrabackup for > 10TB large databases?

So I plan to take backups of Aurora RDS cluster of the size ~20TB, definitely mysqldump (being single core) falls pretty short on speed for such large databases. Whats the best practice to take backups of Aurora RDS MySQL clusters of large size ?
Migration is certainly not my priority( so probably wont need to tweak binary log replication) but only backup of Aurora MySQL cluster using Xtrabackup.

Also if you have tried the backup(Aurora MySQL) and restore(vanilla MySQL) operations on large clusters and could share your experience around best practices and pitfalls to consider, would appreciate that.

Hello @data_freak,

You cannot use Xtrabackup in RDS since you do not have direct access to the disk subsystem.

You need to use mydumper. Here is a real-world example that I used just 2 weeks ago with a customer migrating 4.5TB out of RDS back to EC2. It took about 28 hours to complete the dump.

mydumper -c --regex '^(?!(mysql\.|sys\.))' --rows 1000:10000:0 -t 16 \
-h read-replica.blahblah.eu-west-1.rds.amazonaws.com -u dbadmin \
-p 'dbpass' -G -R -E --hex-blob -a -v 4 -o /mnt/sqldump \
-L ~/dbprod_20230724.log -s 2000000

16 parallel threads. The regex is an inverse match that says NOT to dump any mysql.* or sys.* tables.

You can then load into another MySQL using myloader.

1 Like

Thanks @matthewb , a few more trailing questions:
1 . I see dump was taken from one of the read-replicas, were writes on this DB cluster (coming via writer node) stopped during the dump?
I saw RollbackhistorySegmentHistoryListLength increasing heavily on the writer instance during the dump in one of the experiments (I was taking dump using mysqldump with --single-transaction).

  1. Whats the recommended approach for mydumper (shall the writes during dump time need to be stopped) or “–no-locks” works seamlessly from any active readers ?

Thanks again for sharing the recent experience, definitely helps.

Hey @data_freak,
#1. Yes, writes must be manually stopped. RDS does not allow you to use the standard DBA tools to get around this. You must stop replication for the duration of the dump to ensure a transactionally consistent dump.

Most likely because mysqldump was using FLUSH TABLES WITH READ LOCK for the duration.

#2 On a standard MySQL server, mydumper would be able to sync it’s own threads using a combination of FTWRL, LOCK TABLES FOR BACKUP, etc, all without needing to halt writes. But RDS doesn’t allow the use of these features, so the only way to get a trx consistent dump in RDS is to create a replica then halt replication to stop writes.

1 Like

Thanks again Matthew, for sharing the knowledge. A few followup questions-

“a customer migrating 4.5TB out of RDS back to EC2. It took about 28 hours to complete the dump”

Was it RDS aurora mysql or RDS mysql ?

so the only way to get a trx consistent dump in RDS is to create a replica then halt replication to stop writes.

Yes, I’ve experienced that in Aurora even the master user could not acquire the global lock, so the snapshot will not be consistent (its logged as errors in mydumper).

If you had used Aurora for the aforementioned customer case could you please share the steps you had followed to break replication(there are a few suggestions over the web but since you have done it recently, a good idea to learn from you)?

  1. Create a new replica
  2. Halt replication (steps or any specific privileges for dump user ?)
  3. Execute dump from replica
  4. Destroy the replica or enable replication

@data_freak,
The customer in question was actually using RDS MariaDB. Your steps 1-4 are correct. mydumper was executed using the “admin” account that you create when you launch the RDS.