xtrabackup 8 stuck on prepare

Hi,

we have a problem with the prepare process on a freshly taken backup. The prepare starts and runs for a while and then suddenly it gets stuck. The process is still running, but it is not doing anything.

I waited for more than 12 hours, to be sure that there is no long lasting process in the log, but still nothing happened.

We are using xtrabackup 8.0.5 with a fresh installation of MySQL 8.0.14.
The data for the MySQL-Server has been restored from a backup of a MySQL 5.7.22 and then upgraded with mysql_upgrade.
After preparing the data we installed xtrabackup 8.0.5 and took a backup, which we tried to prepare then.

Here is the last output from prepare process:

xtrabackup: recognized server arguments: --innodb_checksum_algorithm=innodb --innodb_log_checksums=1 --innodb_data_file_path=ibdata1:12M:autoextend --innodb_log_files_in_group=2 --innodb_log_file_size=2684354560 --innodb_page_size=16384 --innodb_undo_directory=./ --innodb_undo_tablespaces=2 --server-id=303 --innodb_log_checksums=ON --innodb_redo_log_encrypt=0 --innodb_undo_log_encrypt=0
xtrabackup: recognized client arguments: --prepare=1 --use-memory=2GB --target-dir=/backup-storage/backup_dir/
/usr/bin/xtrabackup version 8.0.5 based on MySQL server 8.0.14 Linux (x86_64) (revision id: 40ec8a3)
xtrabackup: cd to /backup-storage/backup_dir//
xtrabackup: This target seems to be not prepared yet.
Number of pools: 1
xtrabackup: xtrabackup_logfile detected: size=10854268928, start_lsn=(65697107320549)
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = .
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = .
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 10854268928
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = .
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = .
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 10854268928
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 2147483648 bytes for buffer pool (set by --use-memory parameter)
PUNCH HOLE support available
Mutexes and rw_locks use GCC atomic builtins
Uses event mutexes
GCC builtin __atomic_thread_fence() is used for memory barrier
Compressed tables use zlib 1.2.7
Number of pools: 1
Using CPU crc32 instructions
Directories to scan '.;.;.'
Scanning './'
Thread# 0 - Checked 487/2473 files
Thread# 0 - Checked 884/2473 files
Thread# 0 - Checked 1308/2473 files
Thread# 0 - Checked 1823/2473 files
Thread# 0 - Checked 2408/2473 files
Checked 2473 files
Completed space ID check of 2475 files.
Initializing buffer pool, total size = 2.000000G, instances = 1, chunk size =128.000000M
Completed initialization of buffer pool
page_cleaner coordinator priority: -20
page_cleaner worker priority: -20
page_cleaner worker priority: -20
page_cleaner worker priority: -20
The log sequence number 65634873689279 in the system tablespace does not match the log sequence number 65697107320549 in the ib_logfiles!
Database was not shutdown normally!
Starting crash recovery.
Starting to parse redo log at lsn = 65697107320655, whereas checkpoint_lsn = 65697107320549
Doing recovery: scanned up to log sequence number 65697112563200
...
Doing recovery: scanned up to log sequence number 65697815109120
Applying a batch of 83028 redo log records ...
4.6%
10%
...
100%
Apply batch completed!
Doing recovery: scanned up to log sequence number 65697820352000
...
Doing recovery: scanned up to log sequence number 65698056281600
my_realpath(./analysis/#sql-ib1616234-2716536971.ibd) failed!
my_realpath(./analysis/dcd_inv.ibd) failed!
Creating the tablespace : ./analysis/dcd_inv.ibd, space_id : 292372
my_realpath(./analysis/#sql-ib1616234-2716536971.ibd) failed!
Doing recovery: scanned up to log sequence number 65698061524480
...
Doing recovery: scanned up to log sequence number 65698155896320
my_realpath(./analysis/#sql-ib1616235-2716536972.ibd) failed!
my_realpath(./analysis/dcd_loc.ibd) failed!
Creating the tablespace : ./analysis/dcd_loc.ibd, space_id : 292373
my_realpath(./analysis/#sql-ib1616235-2716536972.ibd) failed!
Doing recovery: scanned up to log sequence number 65698161139200
...
Doing recovery: scanned up to log sequence number 65698622512640
Applying a batch of 190541 redo log records ...
1.9%
6%
...
30%
Extending tablespace : 150110 space name: onion/customer_visit_line_upload_log to new size: 8228608 pages during recovery.
...
Extending tablespace : 150110 space name: onion/customer_visit_line_upload_log to new size: 8229120 pages during recovery.
40%
50%
Extending tablespace : 220694 space name: mwf/belpos#P#p201904 to new size: 31744 pages during recovery.
59%
...
70%
Extending tablespace : 119368 space name: mwf_sessions/sessions to new size: 670976 pages during recovery.
...
Extending tablespace : 119368 space name: mwf_sessions/sessions to new size: 671744 pages during recovery.
76%
Extending tablespace : 119368 space name: mwf_sessions/sessions to new size: 672000 pages during recovery.
...
Extending tablespace : 119368 space name: mwf_sessions/sessions to new size: 673792 pages during recovery.
80%
80%

Here is the actual stacktrace:

Thread 16 (Thread 0x7f1ca900f700 (LWP 8070)):
#0 0x00007f1cacbe7965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000012f8760 in os_event::wait_low(long) ()
#2 0x00000000012f326b in os_aio_handler(unsigned long, fil_node_t**, void**, IORequest*) ()
#3 0x00000000011b0472 in fil_aio_wait(unsigned long) ()
#4 0x00000000013abcb8 in io_handler_thread ()
#5 0x0000000000b99913 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<Runnable, void (*)(unsigned long), unsigned long> > >::_M_run() ()
#6 0x00000000021cc0af in execute_native_thread_routine ()
#7 0x00007f1cacbe3dd5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f1caa92df6d in clone () from /lib64/libc.so.6
Thread 15 (Thread 0x7f1ca880e700 (LWP 8071)):
#0 0x00007f1cacbe7965 in pthread_cond_wait&#64;&#64;GLIBC_2.3.2 () from /lib64/libpthread.so.0
...
Thread 6 (Thread 0x7f1c1cef7700 (LWP 8080)):
#0 0x00007f1cacbe7965 in pthread_cond_wait&#64;&#64;GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000012f8760 in os_event::wait_low(long) ()
#2 0x0000000001111561 in buf_flush_page_coordinator_thread(unsigned long) ()
#3 0x0000000000b99913 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<Runnable, void (*)(unsigned long), unsigned long> > >::_M_run() ()
#4 0x00000000021cc0af in execute_native_thread_routine ()
#5 0x00007f1cacbe3dd5 in start_thread () from /lib64/libpthread.so.0
#6 0x00007f1caa92df6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f1c17fff700 (LWP 8081)):
#0 0x00007f1cacbe7965 in pthread_cond_wait&#64;&#64;GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000012f8760 in os_event::wait_low(long) ()
#2 0x0000000001110967 in buf_flush_page_cleaner_thread() ()
#3 0x0000000000b9980e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<Runnable, void (*)()> > >::_M_run() ()
#4 0x00000000021cc0af in execute_native_thread_routine ()
#5 0x00007f1cacbe3dd5 in start_thread () from /lib64/libpthread.so.0
#6 0x00007f1caa92df6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f1c177fe700 (LWP 8082)):
#0 0x00007f1cacbe7965 in pthread_cond_wait&#64;&#64;GLIBC_2.3.2 () from /lib64/libpthread.so.0
...
#6 0x00007f1caa92df6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f1c16ffd700 (LWP 8083)):
#0 0x00007f1cacbe7965 in pthread_cond_wait&#64;&#64;GLIBC_2.3.2 () from /lib64/libpthread.so.0
...
#6 0x00007f1caa92df6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f1be7fff700 (LWP 8084)):
#0 0x00007f1cacbeae3d in nanosleep () from /lib64/libpthread.so.0
#1 0x00000000012cfc25 in recv_writer_thread() ()
#2 0x0000000000b9980e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<Runnable, void (*)()> > >::_M_run() ()
#3 0x00000000021cc0af in execute_native_thread_routine ()
#4 0x00007f1cacbe3dd5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f1caa92df6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f1cacff5940 (LWP 8051)):
#0 0x00007f1caa912e07 in sched_yield () from /lib64/libc.so.6
#1 0x00000000011c069d in Fil_shard::mutex_acquire_and_get_space(unsigned int, fil_space_t*&) ()
#2 0x00000000011c0934 in Fil_shard::space_extend(fil_space_t*, unsigned int) ()
#3 0x00000000011c1345 in Fil_shard::get_file_for_io(IORequest const&, fil_space_t*, unsigned int*, fil_node_t*&) ()
#4 0x00000000011c1e79 in Fil_shard::do_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*) ()
#5 0x00000000011c277b in fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*) ()
#6 0x0000000001120e53 in buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, page_id_t const&, page_size_t const&, bool) ()
#7 0x0000000001123215 in buf_read_recv_pages(bool, unsigned int, unsigned int const*, unsigned long) ()
#8 0x00000000012d42ce in recv_apply_hashed_log_recs(log_t&, bool) ()
#9 0x00000000012d74cd in recv_recovery_from_checkpoint_start(log_t&, unsigned long, unsigned long) ()
#10 0x00000000013b7e8d in srv_start(bool, std::string const&, unsigned long) ()
#11 0x0000000000ba10fc in innodb_init(bool, bool) ()
#12 0x0000000000ba8990 in xtrabackup_prepare_func(int, char**) ()
#13 0x0000000000b6abb0 in main ()

I added the full stacktrace as attachment: xtrackup-8-prepare-stacktrace.txt

Could you please give us a hint, what is going wrong?

xtrackup-8-prepare-stacktrace.txt (11.3 KB)

Thanks for your post. It looks as this might be related to this known issue [URL][PXB-1819] PXB 8.0 Freezes during prepare - Percona JIRA

Could you verify that for us? If that’s the case, there’s work underway on resolving that, but obviously if you have a variation on that scenario, that would be good to know too.

Yes, looks as if it is the same issue.

Here ist the IO check:

date; cat /proc/9844/io
Thu Apr 11 14:05:41 CEST 2019
rchar: 7077085096
wchar: 1685294899
syscr: 292926
syscw: 31367
read_bytes: 34934784
write_bytes: 1692946432
cancelled_write_bytes: 0

date; cat /proc/9844/io
Thu Apr 11 14:05:50 CEST 2019
rchar: 7077085096
wchar: 1685294899
syscr: 292926
syscw: 31367
read_bytes: 34934784
write_bytes: 1692946432
cancelled_write_bytes: 0

And here the output of pt-pmp:

Thu Apr 11 14:04:34 CEST 2019
10 pthread_cond_wait,os_event::wait_low,os_aio_handler,fil_aio_wait,io_handler_thread,std::thread::_State_impl::_M_run,execute_native_thread_routine,start_thread(libpthread.so.0),clone(libc.so.6)
3 pthread_cond_wait,os_event::wait_low,buf_flush_page_cleaner_thread,std::thread::_State_impl::_M_run,execute_native_thread_routine,start_thread(libpthread.so.0),clone(libc.so.6)
1 sched_yield(libc.so.6),Fil_shard::mutex_acquire_and_get_space,Fil_shard::space_extend,Fil_shard::get_file_for_io,Fil_shard::do_io,fil_io,buf_read_page_low,buf_read_recv_pages,recv_apply_hashed_log_recs,recv_recovery_from_checkpoint_start,srv_start,innodb_init,xtrabackup_prepare_func,main
1 pthread_cond_wait,os_event::wait_low,buf_flush_page_coordinator_thread,std::thread::_State_impl::_M_run,execute_native_thread_routine,start_thread(libpthread.so.0),clone(libc.so.6)
1 nanosleep(libpthread.so.0),recv_writer_thread,std::thread::_State_impl::_M_run,execute_native_thread_routine,start_thread(libpthread.so.0),clone(libc.so.6)

And top:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9844 root 20 0 5682196 4.4g 17052 R 100.0 1.2 12:19.62 xtrabackup

Is there any time scheduled, when this bug will be fixed and released?

Yes, I can confirm, that it is the same issue.

Is there any time schedule for a release of the fix for this bug?

I just tried to compile it myself, following the given instructions, but it fails always:

cmake .. -DBUILD_CONFIG=xtrabackup_release -DWITH_MAN_PAGES=OFF && make -j4
-- Running cmake version 2.8.12.2
-- Found Git: /bin/git (found version "1.8.3.1")
-- This is Linux version 4.14.35-1844.3.2.el7uek.x86_64
-- We probably need some devtoolset compiler
CMake Warning at CMakeLists.txt:178 (MESSAGE):
Could not find devtoolset gcc


-- MySQL 8.0.14
-- Xtrabackup 8.0.5
-- The C compiler identification is GNU 4.8.5
-- The CXX compiler identification is GNU 4.8.5
-- Check for working C compiler: /bin/cc
-- Check for working C compiler: /bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /bin/c++
-- Check for working CXX compiler: /bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Source directory /backup-storage/percona/percona-xtrabackup
-- Binary directory /backup-storage/percona/percona-xtrabackup/build
-- CMAKE_GENERATOR: Unix Makefiles
-- Looking for include file libaio.h
-- Looking for include file libaio.h - found
-- Looking for io_queue_init in aio
-- Looking for io_queue_init in aio - found
-- Looking for SHM_HUGETLB
-- Looking for SHM_HUGETLB - found
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void *
-- Check size of void * - done
-- SIZEOF_VOIDP 8
-- Performing Test HAVE_NOT_C_BUG_LOOP_VECTORIZE
-- Performing Test HAVE_NOT_C_BUG_LOOP_VECTORIZE - Failed
-- Performing Test HAVE_NOT_C_BUG_LOOP_VECTORIZE_WORKAROUND
-- Performing Test HAVE_NOT_C_BUG_LOOP_VECTORIZE_WORKAROUND - Failed
CMake Error at cmake/compiler_bugs.cmake:34 (MESSAGE):
HAVE_NOT_C_BUG_LOOP_VECTORIZE failed and workaround
'-fvect-cost-model=cheap -fno-tree-loop-distribute-patterns
-fno-tree-loop-vectorize' not working!
Call Stack (most recent call first):
cmake/compiler_bugs.cmake:108 (CHECK_C_COMPILER_BUG)
cmake/build_configurations/compiler_options.cmake:25 (INCLUDE)
CMakeLists.txt:395 (INCLUDE)


-- Configuring incomplete, errors occurred!
See also "/backup-storage/percona/percona-xtrabackup/build/CMakeFiles/CMakeOutput.log".
See also "/backup-storage/percona/percona-xtrabackup/build/CMakeFiles/CMakeError.log".

We’re running on Oracle Linux 7.6.

Can you give us a hint on what could be the problem?

CMakeError.log.txt (3.36 KB)

CMakeOutput.log.txt (30.2 KB)

Is there any news on this case?

Apologies for the delay - I think I missed the notification.

I have spoken with the team and here’s their suggestion…
[LIST]
[]You will need to install devtoolset-7 ([url]Software Collections — Software Collections) if you are on rhel/centos/oracle linux 7.
[
]MySQL/PS/PXB 8.0 needs a compiler with C++11 support and in the case of GCC, its version should be at least 5.3
[*]You will also need to add -DWITH_BOOST={dir} and -DDOWNLOAD_BOOST=1 to the cmake command
[/LIST]
We’ll update the documentation as we realize this information could be more specific…

Thanks for the tip, but I still get the same error:

cmake ../percona-xtrabackup/ -DWITH_BOOST=~/boost -DDOWNLOAD_BOOST=1 -DBUILD_CONFIG=xtrabackup_release -DWITH_MAN_PAGES=OFF && make -j4
-- Running cmake version 2.8.12.2
-- Found Git: /bin/git (found version "1.8.3.1")
-- This is Linux version 4.14.35-1844.3.2.el7uek.x86_64
-- We probably need some devtoolset compiler
-- Using /opt/rh/devtoolset-7/root/usr/bin/gcc
-- Using /opt/rh/devtoolset-7/root/usr/bin/g++
-- MySQL 8.0.14
-- Xtrabackup 8.0.5
-- The C compiler identification is GNU 7.3.1
-- The CXX compiler identification is GNU 7.3.1
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/gcc
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/g++
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Source directory /backup-storage/percona/percona-xtrabackup
-- Binary directory /backup-storage/percona/percona-xtrabackup-build
-- CMAKE_GENERATOR: Unix Makefiles
-- Looking for include file libaio.h
-- Looking for include file libaio.h - found
-- Looking for io_queue_init in aio
-- Looking for io_queue_init in aio - found
-- Looking for SHM_HUGETLB
-- Looking for SHM_HUGETLB - found
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void *
-- Check size of void * - done
-- SIZEOF_VOIDP 8
-- Performing Test HAVE_NOT_C_BUG_LOOP_VECTORIZE
-- Performing Test HAVE_NOT_C_BUG_LOOP_VECTORIZE - Failed
-- Performing Test HAVE_NOT_C_BUG_LOOP_VECTORIZE_WORKAROUND
-- Performing Test HAVE_NOT_C_BUG_LOOP_VECTORIZE_WORKAROUND - Failed
CMake Error at cmake/compiler_bugs.cmake:34 (MESSAGE):
HAVE_NOT_C_BUG_LOOP_VECTORIZE failed and workaround
'-fvect-cost-model=cheap -fno-tree-loop-distribute-patterns
-fno-tree-loop-vectorize' not working!
Call Stack (most recent call first):
cmake/compiler_bugs.cmake:108 (CHECK_C_COMPILER_BUG)
cmake/build_configurations/compiler_options.cmake:25 (INCLUDE)
CMakeLists.txt:395 (INCLUDE)


-- Configuring incomplete, errors occurred!
See also ".../percona/percona-xtrabackup-build/CMakeFiles/CMakeOutput.log".
See also "... /percona/percona-xtrabackup-build/CMakeFiles/CMakeError.log".

See attached logs:

CMakeError.log.txt (3.71 KB)

CMakeOutput.log.txt (36.2 KB)

I just checked again my gcc version. Without having devtoolset-7 enabled, the gcc version is 4.8.5. After enabling the devtoolset with

scl enable devtoolset-7 bash

, the version ist 7.3.1.
But the compiling still fails with enabled devtoolset-7.

Sorry for the delay with this, the latest version of Percona XtraBackup was released on May 9. Has this helped you with your issues?