InnoDB assertion failure crash when altering then optimizing table after upgrade from MySQL 5.7 to 8.0

Hello,

I’m currently performing some test upgrades between MySQL 5.7 and 8.0 before upgrading a production database. In my testing, I’ve found that a specific series of actions after an upgrade will cause a crash.

  1. Upgrade from 5.7 to 8.0 by removing 5.7 Percona packages, changing to the 8.0 yum repo, then installing 8.0 packages (This is being done on RHEL 8, so yum/dnf packages are being used).
  2. Start MySQL. I watch mysqld.log while it’s starting. It successfully imports the 5.7 data with no errors.
  3. Alter a table using the INSTANT algorithm, e.g.: ALTER TABLE tableNameRedacted ADD COLUMN columnNameRedacted TINYINT, ALGORITHM=INSTANT;
  4. Run an OPTIMIZE on that same table: OPTIMIZE TABLE tableName

This procedure immediately crashes mysqld, writing the following to mysqld.log:

2023-08-07T20:33:41.038422Z 18 [ERROR] [MY-013183] [InnoDB] Assertion failure: ddl0builder.cc:992:len <= col->len || DATA_LARGE_MTYPE(col->mtype) || (col->mtype == DATA_POINT && len == DATA_MBR_LEN) || ((col->mtype == DATA_VARCHAR || col->mtype == DATA_BINARY || col->mtype == DATA_VARMYSQL) && (col->len == 0 || len <= col->len + prtype_get_compression_extra(col->prtype))) thread 140190688913152
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to [http://bugs.mysql.com](http://bugs.mysql.com/).
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2023-08-07T20:33:41Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=828db7db80124cd9a47f09b8a4a6dc29ead7ca38
Server Version: 8.0.33-25 Percona Server (GPL), Release 25, Revision 60c9e2c5

Thread pointer: 0x7f2e34000ff0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f80b0369c10 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x2185f51]
/usr/sbin/mysqld(print_fatal_signal(int)+0x2c2) [0x12128b2]
/usr/sbin/mysqld(my_server_abort()+0x75) [0x1212b75]
/usr/sbin/mysqld(my_abort()+0xe) [0x217fb8e]
/usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x309) [0x23d8219]
/usr/sbin/mysqld(ddl::Builder::copy_columns(ddl::Copy_ctx&, unsigned long&, unsigned long&)+0x278) [0x25d1568]
/usr/sbin/mysqld(ddl::Builder::copy_row(ddl::Copy_ctx&, unsigned long&)+0xde) [0x25d1afe]
/usr/sbin/mysqld(ddl::Builder::add_to_key_buffer(ddl::Copy_ctx&, unsigned long&)+0x26) [0x25d1dc6]
/usr/sbin/mysqld(ddl::Builder::bulk_add_row(ddl::Cursor&, ddl::Row&, unsigned long, std::function<dberr_t ()>&&)+0xf8) [0x25d77f8]
/usr/sbin/mysqld(ddl::Builder::add_row(ddl::Cursor&, ddl::Row&, unsigned long, std::function<dberr_t ()>&&)+0xc5) [0x25d7f45]
/usr/sbin/mysqld() [0x25262b4]
/usr/sbin/mysqld() [0x2526609]
/usr/sbin/mysqld(Parallel_reader::Ctx::traverse_recs(PCursor*, mtr_t*)+0x466) [0x23382d6]
/usr/sbin/mysqld(Parallel_reader::Ctx::traverse()+0x148) [0x2338608]
/usr/sbin/mysqld(Parallel_reader::worker(Parallel_reader::Thread_ctx*)+0x248) [0x233e768]
/usr/sbin/mysqld(Parallel_reader::spawn(unsigned long)+0x24) [0x233f124]
/usr/sbin/mysqld(Parallel_reader::run(unsigned long)+0x36) [0x233f1b6]
/usr/sbin/mysqld(ddl::Parallel_cursor::scan(std::vector<ddl::Builder*, ut::allocator<ddl::Builder*, ut::detail::allocator_base_pfs<ddl::Builder*> > >&)+0xa2b) [0x2527eeb]
/usr/sbin/mysqld(ddl::Loader::scan_and_build_indexes()+0xd2) [0x2523822]
/usr/sbin/mysqld(ddl::Loader::build_all()+0xf0) [0x2523960]
/usr/sbin/mysqld(ddl::Context::build()+0x38) [0x25176d8]
/usr/sbin/mysqld(bool ha_innobase::inplace_alter_table_impl<dd::Table>(TABLE*, Alter_inplace_info*)+0x4ba) [0x220ba5a]
/usr/sbin/mysqld(ha_innobase::inplace_alter_table(TABLE*, Alter_inplace_info*, dd::Table const*, dd::Table*)+0x51) [0x220be11]
/usr/sbin/mysqld() [0x113a7cd]
/usr/sbin/mysqld(mysql_alter_table(THD*, char const*, char const*, HA_CREATE_INFO*, Table_ref*, Alter_info*)+0x4d46) [0x11503b6]
/usr/sbin/mysqld(mysql_recreate_table(THD*, Table_ref*, bool)+0x34e) [0x1150e9e]
/usr/sbin/mysqld() [0xfdbab7]
/usr/sbin/mysqld(Sql_cmd_optimize_table::execute(THD*)+0x115) [0xfdc535]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0xb09) [0x108a979]
/usr/sbin/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x5cb) [0x108e57b]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2487) [0x1091027]
/usr/sbin/mysqld(do_command(THD*)+0x1d6) [0x1091bb6]
/usr/sbin/mysqld() [0x1202668]
/usr/sbin/mysqld() [0x261b0a4]
/lib64/libpthread.so.0(+0x81ca) [0x7f85b23561ca]
/lib64/libc.so.6(clone+0x43) [0x7f85b06fae73]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f2e34be8350): OPTIMIZE TABLE tableNameRedacted
Connection ID (thread ID): 18
Status: NOT_KILLED

Please help us make Percona Server better by reporting any
bugs at https://bugs.percona.com/You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

I’ve been able to reproduce this issue multiple times. Curiously enough, if I use Percona xtrabackup to transfer the data from a 5.7 instance to a fresh 8.0 instance, this problem does not happen. I even see messages in the log indicating that it’s importing 5.7 data when I first start the 8.0 instance, but I can run through the same ALTER INSTANT/OPTIMIZE procedure without a crash as long as I’ve run the data through xtrabackup first.

I’d like to better understand this behavior before upgrading our production database servers. Have I encountered a bug? Any ideas why xtrabackup makes this problem go away?

Hello @resuni,
Can you repeat your steps above, but BEFORE #3 ALTER, run OPTIMIZE TABLE first, then ALTER it. Does that also result in a crash? That is “test A”. I have another one. Before step #1, while still on 5.7, OPTIMIZE TABLE, then do all steps. This is “test B”. Does that cause crash?

@resuni can you please mention the exact version of MySQL 8.0 please e.g 8.0.32

Hello @resuni

It looks like this is the bug [PS-8770] INSTANT DDL crashes 8.0.32 after upgrading from 5.7.41 - Percona JIRA
Fixed in PS-8.0.34

@lalit.choudhary This is good news, thank you. I’m eagerly anticipating this update. Can you speculate as to why I wouldn’t experience this bug after importing data with xtrabackup?

@matthewb I will try your tests as soon as I find a way to run this with less data. Right now I’m testing on production data, which takes hours to restore every time I crash a database. I plan to attempt to reproduce this with a smaller sample size.

@resuni,
If you’re on ZFS, BTRFS, EBS, or use LVM, you can use snapshots to make this process quicker. :slight_smile:

Here is the patch for the fix with comments from Oracle

Based on Oracle’s comments, my guess is that since Xtrabackup “cleans” the state of InnoDB before starting MySQL 8, the issue isn’t seen. But when you do a direct 5.7->8, the incorrect state from 5.7 remains in cache and mysql 8, on first start, did not clean that cache properly.

1 Like

If you’re on ZFS, BTRFS, EBS, or use LVM, you can use snapshots to make this process quicker.

Unfortunately, it’s just an XFS file system, but it certainly wouldn’t be a bad idea to consider switching the next time I have to perform a restore.

Based on Oracle’s comments, my guess is that since Xtrabackup “cleans” the state of InnoDB before starting MySQL 8, the issue isn’t seen. But when you do a direct 5.7->8, the incorrect state from 5.7 remains in cache and mysql 8, on first start, did not clean that cache properly.

I’m wondering if I can use xtrabackup to “clean” the data directory without actually transferring from another server. This will be something else I test once I am able.