[ERROR] [MY-012800] [InnoDB] [FATAL] Unknown error code 21: Skip locked records

Pavlo_Tkachenko · October 11, 2023, 12:45pm

we have a cluster with 3 nodes, but recently it crashes with the following error logs:

2 nodes crash with this error and the cluster stops accepting requests.
2023-10-10T23:20:02.208122Z 703 [ERROR] [MY-012800] [InnoDB] [FATAL] Unknown error code 21: Skip locked records 2023-10-10T23:20:02.208152Z 703 [ERROR] [MY-013183] [InnoDB] Assertion failure: row0mysql.cc:1220:ib::fatal triggered thread 140339245459008 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to [http://bugs.mysql.com](http://bugs.mysql.com/). InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 2023-10-10T23:20:02Z UTC - mysqld got signal 6 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. BuildID[sha1]=3f6331e0284addef5a5d82661c36cc7b6a6dc15a Server Version: 8.0.33-25.1 Percona XtraDB Cluster (GPL), Release rel25, Revision 0c56202, WSREP version 26.1.4.3, wsrep_26.1.4.3 Thread pointer: 0x7fa2b0006780 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fa346df9b70 thread_stack 0x100000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x56234e1a5351] /usr/sbin/mysqld(print_fatal_signal(int)+0x39f) [0x56234d1d6a4f] /usr/sbin/mysqld(my_server_abort()+0x7e) [0x56234d1d6c0e] /usr/sbin/mysqld(my_abort()+0xe) [0x56234e19f0be] /usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x33a) [0x56234e40723a] /usr/sbin/mysqld(ib::fatal::~fatal()+0xc8) [0x56234e409c58] /usr/sbin/mysqld(row_mysql_handle_errors(dberr_t*, trx_t*, que_thr_t*, trx_savept_t*)+0x13c) [0x56234e34dccc] /usr/sbin/mysqld(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x72d) [0x56234e381f0d] /usr/sbin/mysqld(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+0x584) [0x56234e20a044] /usr/sbin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x451) [0x56234cd25551] /usr/sbin/mysqld(RefIterator<false>::Read()+0x17d) [0x56234d369ecd] /usr/sbin/mysqld(FilterIterator::Read()+0x18) [0x56234d35b618] /usr/sbin/mysqld(LimitOffsetIterator::Read()+0x7d) [0x56234d35b7ad] /usr/sbin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x2a3) [0x56234d1328c3] /usr/sbin/mysqld(Query_expression::execute(THD*)+0x30) [0x56234d132c70] /usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x189) [0x56234d0a8d89] /usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x1ef4) [0x56234d047a24] /usr/sbin/mysqld(Prepared_statement::execute(THD*, String*, bool)+0x6b0) [0x56234d07d770] /usr/sbin/mysqld(Prepared_statement::execute_loop(THD*, String*, bool)+0x12d) [0x56234d08245d] /usr/sbin/mysqld(mysqld_stmt_execute(THD*, Prepared_statement*, bool, unsigned long, PS_PARAM*)+0x1d4) [0x56234d082af4] /usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x159e) [0x56234d04ee5e] /usr/sbin/mysqld(do_command(THD*)+0x204) [0x56234d0518e4] /usr/sbin/mysqld(+0x13d2338) [0x56234d1c6338] /usr/sbin/mysqld(+0x2855f89) [0x56234e649f89] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fa39f478ac3] /lib/x86_64-linux-gnu/libc.so.6(+0x126a40) [0x7fa39f50aa40] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7fa2b14d11d8): select * from `jobs` where `queue` = 'default' and ((`reserved_at` is null and `available_at` <= 1696980002) or (`reserved_at` <= 1696979912)) order by `id` asc limit 1 FOR UPDATE SKIP LOCKED Connection ID (thread ID): 703 Status: NOT_KILLED You may download the Percona XtraDB Cluster operations manual by visiting http://www.percona.com/software/percona-xtradb-cluster/. You may find information in the manual which will help you identify the cause of the crash. 2023-10-10T23:20:02.224072Z 703 [Note] [MY-000000] [WSREP] Initiating SST cancellation Log of wsrep recovery (--wsrep-recover): INFO: WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery_verbose.tSIUOD' --pid-file='/var/lib/mysql/pxc-cluster-node-1-recover.pid' INFO: WSREP: Recovered position 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:6972944

Our settings for nodes are as follows:

# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid

max_allowed_packet = 128M


# pmm
slow_query_log=ON
log_output=FILE
long_query_time=0
log_slow_admin_statements=ON
log_slow_slave_statements=ON

log_slow_rate_limit=100
log_slow_rate_type='query'
slow_query_log_always_write_time=1
log_slow_verbosity='full'
slow_query_log_use_global_control='all'

userstat=ON

performance_schema=ON
performance-schema-instrument='statement/%=ON'
performance-schema-consumer-statements-digest=ON
innodb_monitor_enable=all

# Binary log expiration period is 604800 seconds, which equals 7 days
binlog_expire_logs_seconds=604800

######## wsrep ###############
# Path to Galera library
wsrep_provider=/usr/lib/galera4/libgalera_smm.so

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm:/server1,server2,server3
# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW

# Slave thread to use
wsrep_slave_threads=8

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2

# Node IP address
wsrep_node_address=server1
# Cluster name
wsrep_cluster_name=pxc-cluster

#If wsrep_node_name is not specified,  then system hostname will be used
wsrep_node_name=pxc-cluster-node-1

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=PERMISSIVE

# SST method
wsrep_sst_method=xtrabackup-v2

[mysqld]
wsrep_provider_options=socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem

[sst]
encrypt=4
ssl-key=server-key.pem
ssl-ca=ca.pem
ssl-cert=server-cert.pem

Please help analyze this and fix it, maybe someone has already faced something like this!?

matthewb · October 12, 2023, 1:55am

Hello @Pavlo_Tkachenko,
Can you confirm this query worked in a previous version of PXC? Did you recently upgrade PXC?

select * from jobs where queue = ‘default’ and ((reserved_at is null and available_at <= 1696980002) or (reserved_at <= 1696979912)) order by id asc limit 1 FOR UPDATE SKIP LOCKED

If this did work before, I’d ask you to please open a bug report at https://jira.percona.com/ and supply our engineers with the table schema for jobs table so that they can attempt to recreate it.

Pavlo_Tkachenko · October 13, 2023, 6:54am

Hi @matthewb , thanks for your reply. No, we didn’t update pxc, and we didn’t use early versions. Our current version is Ver 8.0.33-25.

gsdj · February 19, 2024, 4:49pm

Hi there, I have exactly the same problem after migrating from a standalone MySQL 8.0.16 to a 3-node 8.0.35-27.1 Percona XtraDB Cluster.
The query is from a Laravel app:
select * from jobs where queue = ‘juno’ and ((reserved_at is null and available_at <= 1708282931) or (reserved_at <= 1708282631)) order by id asc limit 1 FOR UPDATE SKIP LOCKED

The first search results pointed at a InnoDB file corruption, so I dropped the “jobs” table and rebuilt it from the dump. I did this on every node, to no avail. After that, I renamed the table and created it from the dump again, to avoid reusage of the corrupted file, if that is even possible.

The nodes crash randomly, once every 1-5 hours, node1 seems to be the most stable one.

The crash dump says:

InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
""""""""""""""2024-02-18T19:02:11Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=82e0381828780a878c13947ae9217abad3e30d93
Server Version: 8.0.35-27.1 Percona XtraDB Cluster (GPL), Release rel27, Revision 84d9464, WSREP version 26.1.4.3, wsrep_26.1.4.3

Thread pointer: 0x7f00322f5780
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f19960ebb70 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x5562434fbce1]
/usr/sbin/mysqld(print_fatal_signal(int)+0x39f) [0x556242526b2f]
/usr/sbin/mysqld(my_server_abort()+0x7e) [0x556242526cee]
/usr/sbin/mysqld(my_abort()+0xe) [0x5562434f5a5e]
/usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x33a) [0x55624376219a]
/usr/sbin/mysqld(ib::fatal::~fatal()+0xc8) [0x556243764bd8]
/usr/sbin/mysqld(row_mysql_handle_errors(dberr_t*, trx_t*, que_thr_t*, trx_savept_t*)+0x13c) [0x5562436a894c]
/usr/sbin/mysqld(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x72d) [0x5562436dd2dd]
/usr/sbin/mysqld(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+0x584) [0x556243563da4]
/usr/sbin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x451) [0x556242070671]
/usr/sbin/mysqld(RefIterator<false>::Read()+0x17d) [0x5562426ba9ad]
/usr/sbin/mysqld(FilterIterator::Read()+0x18) [0x5562426ac0c8]
/usr/sbin/mysqld(LimitOffsetIterator::Read()+0x7d) [0x5562426ac25d]
/usr/sbin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x2a3) [0x556242481c03]
/usr/sbin/mysqld(Query_expression::execute(THD*)+0x30) [0x556242481fb0]
/usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x1ec) [0x5562423f810c]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x1eef) [0x556242395f5f]
/usr/sbin/mysqld(Prepared_statement::execute(THD*, String*, bool)+0x6b0) [0x5562423cbdd0]
/usr/sbin/mysqld(Prepared_statement::execute_loop(THD*, String*, bool)+0x12d) [0x5562423d0b4d]
/usr/sbin/mysqld(mysqld_stmt_execute(THD*, Prepared_statement*, bool, unsigned long, PS_PARAM*)+0x1d4) [0x5562423d1214]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x159e) [0x55624239d3ce]
/usr/sbin/mysqld(do_command(THD*)+0x204) [0x55624239fe84]
/usr/sbin/mysqld(+0x13e5418) [0x556242516418]
/usr/sbin/mysqld(+0x2872e99) [0x5562439a3e99]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f19d669aac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f19d672c850]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f00312122b8): select * from `jobs` where `queue` = '<different_queue_names>' and ((`reserved_at` is null and `available_at` <= 1708282931) or (`reserved_at` <= 1708282631)) order by `id` asc limit 1 FOR UPDATE SKIP LOCKED
Connection ID (thread ID): 19759
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
2024-02-18T19:02:11.439065Z 19759 [Note] [MY-000000] [WSREP] Initiating SST cancellation

The server config is:

# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
server-id=1
datadir=/sql/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
log-error-verbosity = 1
log_error_suppression_list = "013360"
pid-file=/var/run/mysqld/mysqld.pid

binlog_expire_logs_seconds=259200

wsrep_provider=/usr/lib/galera4/libgalera_smm.so
wsrep_provider_options="gcache.size=1G"

wsrep_cluster_address = "gcomm://192.168.4.11,192.168.4.12,192.168.4.13"

binlog_format=ROW

wsrep_applier_threads=8

wsrep_log_conflicts

innodb_autoinc_lock_mode=2

wsrep_cluster_name = "perconacluster1"

wsrep_node_name = sqlnode1

pxc_strict_mode=ENFORCING

wsrep_sst_method=xtrabackup-v2

default_storage_engine = InnoDB
log_bin = binary_log
log_replica_updates = 1

innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 2

bind_address = 192.168.4.11,192.168.5.11

default-time-zone='Europe/Berlin'

sql_mode = ALLOW_INVALID_DATES

[galera]

wsrep_node_address = 192.168.4.11
wsrep_sst_receive_address = 192.168.4.11

wsrep_sst_auth = username:password

Topic		Replies	Views
Assertion failure in file lock0lock.cc line 3980 Percona XtraDB Cluster 5.x	3	2372	October 10, 2014
5.5.21 Crash then auto recovery again and again ,but always failed. Other MySQL® Questions	1	726	April 29, 2014
Percona XtraDB Cluster crashes after updating to version 8.0.35-27.1 Percona XtraDB Cluster 5.x	5	855	March 1, 2024
MySQL DB Instance Crashes with Assertion failure Other MySQL® Questions	1	1032	December 16, 2010
Percona Server fails Other MySQL® Questions	2	755	September 24, 2014

[ERROR] [MY-012800] [InnoDB] [FATAL] Unknown error code 21: Skip locked records

Related topics