we have a cluster with 3 nodes, but recently it crashes with the following error logs:
2 nodes crash with this error and the cluster stops accepting requests.
2023-10-10T23:20:02.208122Z 703 [ERROR] [MY-012800] [InnoDB] [FATAL] Unknown error code 21: Skip locked records 2023-10-10T23:20:02.208152Z 703 [ERROR] [MY-013183] [InnoDB] Assertion failure: row0mysql.cc:1220:ib::fatal triggered thread 140339245459008 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to [http://bugs.mysql.com](http://bugs.mysql.com/). InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 2023-10-10T23:20:02Z UTC - mysqld got signal 6 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. BuildID[sha1]=3f6331e0284addef5a5d82661c36cc7b6a6dc15a Server Version: 8.0.33-25.1 Percona XtraDB Cluster (GPL), Release rel25, Revision 0c56202, WSREP version 26.1.4.3, wsrep_26.1.4.3 Thread pointer: 0x7fa2b0006780 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fa346df9b70 thread_stack 0x100000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x56234e1a5351] /usr/sbin/mysqld(print_fatal_signal(int)+0x39f) [0x56234d1d6a4f] /usr/sbin/mysqld(my_server_abort()+0x7e) [0x56234d1d6c0e] /usr/sbin/mysqld(my_abort()+0xe) [0x56234e19f0be] /usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x33a) [0x56234e40723a] /usr/sbin/mysqld(ib::fatal::~fatal()+0xc8) [0x56234e409c58] /usr/sbin/mysqld(row_mysql_handle_errors(dberr_t*, trx_t*, que_thr_t*, trx_savept_t*)+0x13c) [0x56234e34dccc] /usr/sbin/mysqld(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x72d) [0x56234e381f0d] /usr/sbin/mysqld(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+0x584) [0x56234e20a044] /usr/sbin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x451) [0x56234cd25551] /usr/sbin/mysqld(RefIterator<false>::Read()+0x17d) [0x56234d369ecd] /usr/sbin/mysqld(FilterIterator::Read()+0x18) [0x56234d35b618] /usr/sbin/mysqld(LimitOffsetIterator::Read()+0x7d) [0x56234d35b7ad] /usr/sbin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x2a3) [0x56234d1328c3] /usr/sbin/mysqld(Query_expression::execute(THD*)+0x30) [0x56234d132c70] /usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x189) [0x56234d0a8d89] /usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x1ef4) [0x56234d047a24] /usr/sbin/mysqld(Prepared_statement::execute(THD*, String*, bool)+0x6b0) [0x56234d07d770] /usr/sbin/mysqld(Prepared_statement::execute_loop(THD*, String*, bool)+0x12d) [0x56234d08245d] /usr/sbin/mysqld(mysqld_stmt_execute(THD*, Prepared_statement*, bool, unsigned long, PS_PARAM*)+0x1d4) [0x56234d082af4] /usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x159e) [0x56234d04ee5e] /usr/sbin/mysqld(do_command(THD*)+0x204) [0x56234d0518e4] /usr/sbin/mysqld(+0x13d2338) [0x56234d1c6338] /usr/sbin/mysqld(+0x2855f89) [0x56234e649f89] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fa39f478ac3] /lib/x86_64-linux-gnu/libc.so.6(+0x126a40) [0x7fa39f50aa40] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7fa2b14d11d8): select * from `jobs` where `queue` = 'default' and ((`reserved_at` is null and `available_at` <= 1696980002) or (`reserved_at` <= 1696979912)) order by `id` asc limit 1 FOR UPDATE SKIP LOCKED Connection ID (thread ID): 703 Status: NOT_KILLED You may download the Percona XtraDB Cluster operations manual by visiting http://www.percona.com/software/percona-xtradb-cluster/. You may find information in the manual which will help you identify the cause of the crash. 2023-10-10T23:20:02.224072Z 703 [Note] [MY-000000] [WSREP] Initiating SST cancellation Log of wsrep recovery (--wsrep-recover): INFO: WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery_verbose.tSIUOD' --pid-file='/var/lib/mysql/pxc-cluster-node-1-recover.pid' INFO: WSREP: Recovered position 087fd47b-56ee-11ee-b1f6-0e0dfb3ecf18:6972944
Our settings for nodes are as follows:
# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket=/var/run/mysqld/mysqld.sock
[mysqld]
server-id=1
datadir=/var/lib/mysql
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid
max_allowed_packet = 128M
# pmm
slow_query_log=ON
log_output=FILE
long_query_time=0
log_slow_admin_statements=ON
log_slow_slave_statements=ON
log_slow_rate_limit=100
log_slow_rate_type='query'
slow_query_log_always_write_time=1
log_slow_verbosity='full'
slow_query_log_use_global_control='all'
userstat=ON
performance_schema=ON
performance-schema-instrument='statement/%=ON'
performance-schema-consumer-statements-digest=ON
innodb_monitor_enable=all
# Binary log expiration period is 604800 seconds, which equals 7 days
binlog_expire_logs_seconds=604800
######## wsrep ###############
# Path to Galera library
wsrep_provider=/usr/lib/galera4/libgalera_smm.so
# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address=gcomm:/server1,server2,server3
# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
# Slave thread to use
wsrep_slave_threads=8
wsrep_log_conflicts
# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
# Node IP address
wsrep_node_address=server1
# Cluster name
wsrep_cluster_name=pxc-cluster
#If wsrep_node_name is not specified, then system hostname will be used
wsrep_node_name=pxc-cluster-node-1
#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode=PERMISSIVE
# SST method
wsrep_sst_method=xtrabackup-v2
[mysqld]
wsrep_provider_options=socket.ssl_key=server-key.pem;socket.ssl_cert=server-cert.pem;socket.ssl_ca=ca.pem
[sst]
encrypt=4
ssl-key=server-key.pem
ssl-ca=ca.pem
ssl-cert=server-cert.pem
Please help analyze this and fix it, maybe someone has already faced something like this!?