I’m backing up a ~250G database with 140k tables, with some tables partitioned into ~100 partitions. When I extract and prepare the tar archive, the minimum size of each .ibd file has grown from 96k to 1M. This is especially painful in my situation, as all empty partitions also grow to 1M.
Overall, this causes the data set to nearly triple in size. Even with a larger storage volume, I would have to run optimize table on every single table to shrink the data set to a more manageable size.
Is this a bi-product of the tar stream? Or just something that is unavoidable with Xtrabackup/Innobackupex?
This is definitely occurring during the --apply-log stage, as the data is 250G after extracting the archive. It then grows to ~800G when --apply-log is used, which throws an error since my volume is only 800G.
I would be interested to hear whether you can reproduce the problem by taking a snapshot of the data (through an LVM snapshot, for instance) and then running crash recovery on the resulting snapshot by just starting MySQL. This would help indicate whether it is a problem in InnoDB or in XtraBackup. I think that is one of the most important things to determine.