InnoDB: Assertion failure: log0recv.cc:4463:log.m_files.find(recovered_lsn) != log.m_files.end()

Hi,

We’re using pxc-db with the pxc-operator on Kubernetes and running into an issue with the backups as created by xtrabackup. Every time we try to restore a backup, it seems the LSN will never match and then the process crashes:

2023-08-28T08:57:59.035772-00:00 0 [Note] [MY-013883] [InnoDB] The latest found checkpoint is at lsn = 189492706578 in redo log file ./#innodb_redo/#ib_redo0.
2023-08-28T08:57:59.036068-00:00 0 [Note] [MY-012560] [InnoDB] The log sequence number 189491944578 in the system tablespace does not match the log sequence number 189492706578 in the redo log files!
2023-08-28T08:57:59.036176-00:00 0 [Note] [MY-012551] [InnoDB] Database was not shutdown normally!
2023-08-28T08:57:59.036267-00:00 0 [Note] [MY-012552] [InnoDB] Starting crash recovery.
InnoDB: Assertion failure: log0recv.cc:4463:log.m_files.find(recovered_lsn) != log.m_files.end()
InnoDB: thread 139932499097920InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.percona.com/projects/PXB.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2023-08-28T08:57:59Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
xtrabackup(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x1968cc1]
xtrabackup(print_fatal_signal(int)+0x3bc) [0xe6b8fc]
xtrabackup(handle_fatal_signal+0x95) [0xe6b9a5]
/lib64/libc.so.6(+0x54df0) [0x7f44930f0df0]
/lib64/libc.so.6(+0xa154c) [0x7f449313d54c]
/lib64/libc.so.6(raise+0x16) [0x7f44930f0d46]
/lib64/libc.so.6(abort+0xd3) [0x7f44930c47f3]
xtrabackup(my_abort()+0xe) [0x196448e]
xtrabackup(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0xe4) [0x1238db4]
xtrabackup(recv_recovery_from_checkpoint_start(log_t&, unsigned long, unsigned long)+0x1a73) [0x117bb53]
xtrabackup(srv_start(bool, unsigned long)+0x104c) [0x1228e0c]
xtrabackup() [0x1fb08b0]
xtrabackup() [0x965a29]
xtrabackup(main+0x13c7) [0x907687]
/lib64/libc.so.6(+0x3feb0) [0x7f44930dbeb0]
/lib64/libc.so.6(__libc_start_main+0x80) [0x7f44930dbf60]
xtrabackup(_start+0x25) [0x934ba5]

We would expect a crash recovery to start, but it just doesn’t start. Trying to manually start it the command line yields the same error, even when I add --innodb-force-recovery=6 (tried 1-5 as well, ofc). The redo log size is 2G, which I’m fairly certain should be enough to contain all changes since the start of the backup (the entire db is just 20G).

We’re unsure how to solve this issue moving forward, any help would be appreciated!

This was using the latest image, percona-xtradb-cluster-operator:1.13.0-pxc8.0-backup-pxb8.0.32, switching back to the previous image, percona-xtradb-cluster-operator:1.12.0-pxc8.0-backup, seems to have solved the issue.

Hi @timstoop

Can you please describe the steps that led you to the error, especially about the backup you are trying to use. Has this backup been taken from a different operator version or the versions are aligned?

Thanks in advance.

@timstoop can you provide a full log from restore job/pod?

The cluster was originally deployed using an older version of pxc-operator, but the operator itself has been upgraded on 8-8 to 1.13.1. I assume the backup was created by the operator using the latest image, but I’m not sure. We did not override any image tag, in any case, for the backup job.