xtrabackup fails apply export, missing tablespaces not in partial backup

In our business we have very significant needs to migrate databases from an existing db instance that contain thousands of other databases to other db instances that already exist and contain their own databases. In summary shifting databases for either lack of storage on the source database instance or for shifting same databases workload to other less utilized db servers.

Note these databases range from 100GB to 6TB in size so alternatives such as mysqldump are not an option, as well as, we require a solution that carries all the changes that occur during the backup to the target system. Obviously we have to do this while all systems are live and receiving 100’s of millions of writes.

It is bad enough for partial restores to have to do manual restore tasks per table and per table partition to accomplish the partial migrations, such as, create empty tables, discard tablespaces, copy partial backed up files, and then import tables spaces. Since we are dealing with thousands of databases per db instance to date we have chosen to not migrate say half of one database to another existing database, but to a new db server where the datadir is empty. We do full backups and then followed with dropping half the databases on each of the source and target systems, but this solution is expensive since it requires purchase setup of new servers while we have existing servers with available storage/cpu/mem resources and it risks the business dropping that many databases during cleanup to free up storage.

So we are just beginning to experiment with automating the above nightmarish manual partial restore process.

The issue we are experiencing is after performing a partial backup using a list of tables referenced in --tables-file= to the target server. During the apply step the backup fails, repeatedly on same error, where it is expecting tablespaces to exist for tables not included in the --tables-file. It appears that the apply log step is processing all data that changed on the source server, to the target server when it should be ignoring the thousands of databases that will remain on the source server.

One would think since the backup step can tell you its skipping the 1000’s of databases that the apply could also skip apply the same respective log entries.
innobackupex could easily transmit the --tables-file as its filter for the apply step.

The error is below, note we have seen this error many times on multiple 5.6.22 thru .28 percona setups as well as on the current problem server Server version: 10.1.14-MariaDB-enterprise MariaDB Enterprise Certified Binary The crazy aspect is sometimes it has worked a year or so back.

InnoDB: 5.7.10 started; log sequence number 128760261672799
InnoDB: page_cleaner: 1000ms intended loop took 34730ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
InnoDB: not started
xtrabackup: export option is specified.
InnoDB: Failed to find tablespace for table db_13547.identities in the cache. Attempting to load the tablespace with space id 770333
InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: Cannot open datafile for read-only: ‘./db_13547/identities.ibd’ OS error: 71
InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: Could not find a valid tablespace file for db_13547/identities. Please refer to [url]http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html[/url] for how to resolve the issue.
00:06:56 UTC - xtrabackup got signal 11 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 0 thread_stack 0x10000
innobackupex(my_print_
stacktrace+0x35)[0xd14475]
innobackupex(handle_fatal_signal+0x273)[0x992273]
/lib64/libpthread.so.0(+0xf710)[0x7fa177f23710]
innobackupex(_Z10ER_DEFAULTi+0x10)[0xac82a0]
innobackupex(_Z16explain_filenameP3THDPKcPcm26enum_explain_filename_mode+0x3dc)[0xa0d21c]
innobackupex(_Z27innobase_convert_identifierPcmPKcmP3THD+0x72)[0x779ef2]
innobackupex(_Z11ut_get_namePK5trx_tPKc+0x59)[0x87c7d9]
innobackupex(_ZlsRSoRK12table_name_t+0x1f)[0x8ca7df]
innobackupex[0x7d5882]
innobackupex(_Z15dict_load_tablePKcb17dict_err_ignore_t+0x1ab)[0x7d5aab]
innobackupex(_Z21dict_load_table_on_idm17dict_err_ignore_t+0x67b)[0x7dc73b]
innobackupex(_Z21dict_table_open_on_idmm15dict_table_op_t+0xf8)[0x897f38]
innobackupex(_Z14row_purge_stepP9que_thr_t+0x207)[0x8c69d7]
innobackupex(_Z15que_run_threadsP9que_thr_t+0x99f)[0x7a640f]
innobackupex(srv_worker_thread+0x1e9)[0x92d4f9]
/lib64/libpthread.so.0(+0x79d1)[0x7fa177f1b9d1]
/lib64/libc.so.6(clone+0x6d)[0x7fa1763ea9dd]

Any and all help is very much apppreciated!

Scott