PerconaFT recovery fail after disk failure

We had a disk failure on a RAID array which caused our server to go down. On rebuilding the array we cannot start up percona due to a problem with the recovery log file. We have to start innodb with innodb_force_recovery to reach the tokudb loading stage but then receive this error (after innodb startup):

2018-01-16T11:34:32.800498Z 0 [Note] InnoDB: Buffer pool(s) load completed at 180116 11:34:32
Tue Jan 16 11:34:32 2018 PerconaFT recovery starting in env /var/local/percona/mysql/current/
Tue Jan 16 11:34:32 2018 PerconaFT recovery failed -30975
2018-01-16T11:34:32.896964Z 0 [ERROR] TokuDB unknown error -30975
/mnt/workspace/percona-server-5.7-binaries-release/label_exp/debian-wheezy-x64/percona-server-5.7.18-15/storage/tokudb/PerconaFT/portability/file.cc:425 file_fsync_internal: Assertion `get_error
_errno() == EINTR' failed (errno=22)
: Invalid argument
Backtrace: (Note: toku_do_assert=0x0x7fca994bb010)
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_Z19db_env_do_backtraceP8_IO_FILE+0x1b)[0x7fca994b9fcb]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x91153)[0x7fca994ba153]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x8fb1a)[0x7fca994b8b1a]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x916b4)[0x7fca994ba6b4]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_Z23toku_logger_maybe_fsyncP10tokulogger10__toku_lsnib+0x1e7)[0x7fca994fa007]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_Z23toku_log_end_checkpointP10tokuloggerP10__toku_lsniS1_mjj+0x141)[0x7fca9951fb71]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_ZN12checkpointer18log_end_checkpointEv+0x31)[0x7fca9951ff71]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_ZN12checkpointer14end_checkpointEPFvPvES0_+0x66)[0x7fca9951dbc6]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_Z15toku_checkpointP12checkpointerP10tokuloggerPFvPvES3_S5_S3_19checkpoint_caller_t+0x1d8)[0x7fca9951dde8]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x11afe5)[0x7fca99543fe5]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x66c69)[0x7fca9948fc69]
/opt/percona/mysql/current/bin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x51)[0x99f9f1]
/opt/percona/mysql/current/bin/mysqld[0xe1c6c6]
/opt/percona/mysql/current/bin/mysqld[0xe1e627]
/opt/percona/mysql/current/bin/mysqld(_Z11plugin_initPiPPci+0x949)[0xe224d9]
/opt/percona/mysql/current/bin/mysqld(_Z11mysqld_mainiPPc+0xab8)[0x9402d8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fcab487e830]
/opt/percona/mysql/current/bin/mysqld[0x937955]
Engine status function not available
Memory usage:
Arena 0:
system bytes = 0
in use bytes = 0
Total (incl. mmap):
system bytes = 0
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
11:34:32 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=215
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 93396 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/opt/percona/mysql/current/bin/mysqld(my_print_stacktrace+0x2e)[0xffbc1e]
/opt/percona/mysql/current/bin/mysqld(handle_fatal_signal+0x4a1)[0x9475d1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fcab67f2390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7fcab4893428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fcab489502a]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x91158)[0x7fca994ba158]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x8fb1a)[0x7fca994b8b1a]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x916b4)[0x7fca994ba6b4]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_Z23toku_logger_maybe_fsyncP10tokulogger10__toku_lsnib+0x1e7)[0x7fca994fa007]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_Z23toku_log_end_checkpointP10tokuloggerP10__toku_lsniS1_mjj+0x141)[0x7fca9951fb71]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_ZN12checkpointer18log_end_checkpointEv+0x31)[0x7fca9951ff71]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_ZN12checkpointer14end_checkpointEPFvPvES0_+0x66)[0x7fca9951dbc6]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(_Z15toku_checkpointP12checkpointerP10tokuloggerPFvPvES3_S5_S3_19checkpoint_caller_t+0x1d8)[0x7fca9951dde8]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x11afe5)[0x7fca99543fe5]
/opt/percona/mysql/current/lib/mysql/plugin/ha_tokudb.so(+0x66c69)[0x7fca9948fc69]
/opt/percona/mysql/current/bin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x51)[0x99f9f1]
/opt/percona/mysql/current/bin/mysqld[0xe1c6c6]
/opt/percona/mysql/current/bin/mysqld[0xe1e627]
/opt/percona/mysql/current/bin/mysqld(_Z11plugin_initPiPPci+0x949)[0xe224d9]
/opt/percona/mysql/current/bin/mysqld(_Z11mysqld_mainiPPc+0xab8)[0x9402d8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fcab487e830]
/opt/percona/mysql/current/bin/mysqld[0x937955]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

There doesn’t seem to be much documentation about recovering from this point. Any ideas how to fix the recovery log?

The recommended recovery for TokuDB in case of hardware failure is to recover from backup (or from replica)

Peter is correct, at this point, I would strongly suggest recovering from backup both your InnoDB and TokuDB data. The fact that you had to force recover InnoDB all the way up to 6 illustrates some serious damage exists. TokuDB does not have any means of forceful or lossy recovery implemented. What is curious is that the call stack and assertion indicate that an fsync failed with an “EINVAL : Invalid Argument”. The only argument to fsync is the file descriptor (fd). From the fsync documentation :

“EROFS, EINVAL fd is bound to a special file which does not support synchronization.”

From this, my best guess is that something is still wrong with the hardware or you may have found some specific case of recovery where possibly the file is opened in such a way that fsync is not permitted. I have seen this happen once before where the recovery log was severely damaged (0 bytes or torn/partially written header) and was never able to diagnose the path it took to get there.

One trick that might get you past such a scenario would be to remove all existing TokuDB/PerconaFT recovery log files and copy a fresh, empty, undamaged recovery log file from a clean install that was properly shutdown (renaming is not necessary). See [url]TokuDB files and file types — Percona Server 8.0 Documentation for the naming convention and how to find it. Please note that this is a very dangerous suggestion. It may not work and it could permanently and irreparably damage your existing data set. If it does work, all changes since the last checkpoint will be lost (default checkpointing period is 60 seconds).


George O. Lorch III
Principal Software Engineer, Percona