Nodes got aborted one by one when creating/droping index

Hi all,

I’m having a 3 nodes Cluster.

mysql> SHOW VARIABLES LIKE ‘version%’;
±------------------------±--------------------------------------------------------------------------------------+
| Variable_name | Value |
±------------------------±--------------------------------------------------------------------------------------+
| version | 8.0.37-29.1 |
| version_comment | Percona XtraDB Cluster (GPL), Release rel29, Revision d29a325, WSREP version 26.1.4.3 |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
| version_compile_zlib | 1.2.13 |
| version_suffix | .1 |
±------------------------±--------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)

On top of those 3 nodes, I have a proxysql:

±-------------±-----------±-----±----------±--------±-------±------------±----------------±--------------------±--------±---------------±--------+
| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
±-------------±-----------±-----±----------±--------±-------±------------±----------------±--------------------±--------±---------------±--------+
| 10 | prod-db-01 | 3306 | 0 | SHUNNED | 10000 | 0 | 1000 | 0 | 0 | 0 | |
| 10 | prod-db-02 | 3306 | 0 | ONLINE | 10000 | 0 | 1000 | 0 | 0 | 0 | |
| 10 | prod-db-03 | 3306 | 0 | SHUNNED | 9999 | 0 | 1000 | 0 | 0 | 0 | |
| 20 | prod-db-01 | 3306 | 0 | ONLINE | 10000 | 0 | 1000 | 0 | 0 | 0 | |
| 20 | prod-db-03 | 3306 | 0 | ONLINE | 9999 | 0 | 1000 | 0 | 0 | 0 | |
| 30 | prod-db-01 | 3306 | 0 | ONLINE | 10000 | 0 | 1000 | 0 | 0 | 0 | |
| 30 | prod-db-03 | 3306 | 0 | ONLINE | 9999 | 0 | 1000 | 0 | 0 | 0 | |
±-------------±-----------±-----±----------±--------±-------±------------±----------------±--------------------±--------±---------------±--------+

I applied write/read split:

*************************** 1. row ***************************
rule_id: 100
active: 1
username: NULL
schemaname: NULL
flagIN: 0
client_addr: NULL
proxy_addr: NULL
proxy_port: NULL
digest: NULL
match_digest: NULL
match_pattern: ^SELECT .* FOR UPDATE
negate_match_pattern: 0
re_modifiers: CASELESS
flagOUT: NULL
replace_pattern: NULL
destination_hostgroup: 10
cache_ttl: NULL
cache_empty_result: NULL
cache_timeout: NULL
reconnect: NULL
timeout: NULL
retries: NULL
delay: NULL
next_query_flagIN: NULL
mirror_flagOUT: NULL
mirror_hostgroup: NULL
error_msg: NULL
OK_msg: NULL
sticky_conn: NULL
multiplex: NULL
gtid_from_hostgroup: NULL
log: NULL
apply: 1
attributes:
comment: NULL
*************************** 2. row ***************************
rule_id: 200
active: 1
username: NULL
schemaname: NULL
flagIN: 0
client_addr: NULL
proxy_addr: NULL
proxy_port: NULL
digest: NULL
match_digest: NULL
match_pattern: ^SELECT .*
negate_match_pattern: 0
re_modifiers: CASELESS
flagOUT: NULL
replace_pattern: NULL
destination_hostgroup: 30
cache_ttl: NULL
cache_empty_result: NULL
cache_timeout: NULL
reconnect: NULL
timeout: NULL
retries: NULL
delay: NULL
next_query_flagIN: NULL
mirror_flagOUT: NULL
mirror_hostgroup: NULL
error_msg: NULL
OK_msg: NULL
sticky_conn: NULL
multiplex: NULL
gtid_from_hostgroup: NULL
log: NULL
apply: 1
attributes:
comment: NULL
*************************** 3. row ***************************
rule_id: 300
active: 1
username: NULL
schemaname: NULL
flagIN: 0
client_addr: NULL
proxy_addr: NULL
proxy_port: NULL
digest: NULL
match_digest: NULL
match_pattern: .*
negate_match_pattern: 0
re_modifiers: CASELESS
flagOUT: NULL
replace_pattern: NULL
destination_hostgroup: 10
cache_ttl: NULL
cache_empty_result: NULL
cache_timeout: NULL
reconnect: NULL
timeout: NULL
retries: NULL
delay: NULL
next_query_flagIN: NULL
mirror_flagOUT: NULL
mirror_hostgroup: NULL
error_msg: NULL
OK_msg: NULL
sticky_conn: NULL
multiplex: NULL
gtid_from_hostgroup: NULL
log: NULL
apply: 1
attributes:
comment: NULL
3 rows in set (0.00 sec)

Before anything happened we ran a script (ssh directly to Node02 to run) but got error because of missing the index.

ERROR 1822 (HY000) at line 94: Failed to add the foreign key constraint. Missing index for constraint ‘fk_ms_shortname’ in the referenced table ‘catalogitems’

Then :
The problem is that we tried to create and drop indexes via DBeaver Tool (while there are user still using the system → query, update to the DB), we got the Cluster crashed (Cluster crashed around 2025-07-10T12:45).

These are log files relatively to Node01, Node02 and Node03.
Node01 log
Node02 log
Node03 log

To bring back the cluster again, I have to restart Node01 and Node03 (these are aborted), then the SST flow started.

Please help me to identify what is the root cause. I tried to reproduce the issue by running queries while creating/dropping indexes but cannot have the same issue (Nodes got aborted).

Thanks.

You could be affected by this bug:

I would suggest upgrading to the fixed version (8.0.36-28) or better yet the latest release.

For DDL operations we highly recommend using pt-online-schema-change using the TOI method.

Alternatively, check this blog post for the RSU method:

Also noticed it looks like the fk is dropped but you are createing a full text index so you may be doing several changes at one time including dropping tables so you may have triggers or other items that are going on at the same time..