Percona XtraDB Cluster stucks when softfailed node returns to cluster

Hi
I have 3 nodes cluster PXC (Ver 8.0.28-19.1 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel19, Revision f544540, WSREP version 26.4.3) on Debian 11.

I`ve tested cluster performance and HA with sysbench and manual script - all was fine: stop and restart each node passed well.

But I have problems with production workload. When all DMLs and DQLs are served by 1 node, eg, node1 and Ive stopped, eg, node3 - cluster worked fine. But when Im trying to return node3 to PXC node1 stucks with many stucked Queries.
show status like 'wsrep_%'; # on problem node1

+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                    | Value                                                                                                                                          |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid           | 5e1d7698-1fb9-11ed-94e3-a3cf0985bb6d                                                                                                           |
| wsrep_protocol_version           | 10                                                                                                                                             |
| wsrep_last_applied               | 788814                                                                                                                                         |
| wsrep_last_committed             | 788814                                                                                                                                         |
| wsrep_monitor_status (L/A/C)     | [ (806139, 806122), (788814, 788814), (788814, 788814) ]                                                                                       |
| wsrep_replicated                 | 711155                                                                                                                                         |
| wsrep_replicated_bytes           | 350865256                                                                                                                                      |
| wsrep_repl_keys                  | 2184313                                                                                                                                        |
| wsrep_repl_keys_bytes            | 34542224                                                                                                                                       |
| wsrep_repl_data_bytes            | 266613970                                                                                                                                      |
| wsrep_repl_other_bytes           | 0                                                                                                                                              |
| wsrep_received                   | 94983                                                                                                                                          |
| wsrep_received_bytes             | 39284050                                                                                                                                       |
| wsrep_local_commits              | 711139                                                                                                                                         |
| wsrep_local_cert_failures        | 0                                                                                                                                              |
| wsrep_local_replays              | 0                                                                                                                                              |
| wsrep_local_send_queue           | 0                                                                                                                                              |
| wsrep_local_send_queue_max       | 16                                                                                                                                             |
| wsrep_local_send_queue_min       | 0                                                                                                                                              |
| wsrep_local_send_queue_avg       | 0.017697                                                                                                                                       |
| wsrep_local_recv_queue           | 2                                                                                                                                              |
| wsrep_local_recv_queue_max       | 27                                                                                                                                             |
| wsrep_local_recv_queue_min       | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg       | 0.0782334                                                                                                                                      |
| wsrep_local_cached_downto        | 2                                                                                                                                              |
| wsrep_flow_control_paused_ns     | 1311449241                                                                                                                                     |
| wsrep_flow_control_paused        | 5.47239e-06                                                                                                                                    |
| wsrep_flow_control_sent          | 0                                                                                                                                              |
| wsrep_flow_control_recv          | 4                                                                                                                                              |
| wsrep_flow_control_active        | false                                                                                                                                          |
| wsrep_flow_control_requested     | false                                                                                                                                          |
| wsrep_flow_control_interval      | [ 222, 277 ]                                                                                                                                   |
| wsrep_flow_control_interval_low  | 222                                                                                                                                            |
| wsrep_flow_control_interval_high | 277                                                                                                                                            |
| wsrep_flow_control_status        | OFF                                                                                                                                            |
| wsrep_cert_deps_distance         | 99.1923                                                                                                                                        |
| wsrep_apply_oooe                 | 0.705456                                                                                                                                       |
| wsrep_apply_oool                 | 0.0114578                                                                                                                                      |
| wsrep_apply_window               | 1.98774                                                                                                                                        |
| wsrep_apply_waits                | 0                                                                                                                                              |
| wsrep_commit_oooe                | 0                                                                                                                                              |
| wsrep_commit_oool                | 0                                                                                                                                              |
| wsrep_commit_window              | 1.00849                                                                                                                                        |
| wsrep_local_state                | 4                                                                                                                                              |
| wsrep_local_state_comment        | Synced                                                                                                                                         |
| wsrep_cert_index_size            | 145                                                                                                                                            |
| wsrep_cert_bucket_count          | 541                                                                                                                                            |
| wsrep_gcache_pool_size           | 1346893592                                                                                                                                     |
| wsrep_causal_reads               | 0                                                                                                                                              |
| wsrep_cert_interval              | 2.10996                                                                                                                                        |
| wsrep_open_transactions          | 16                                                                                                                                             |
| wsrep_open_connections           | 0                                                                                                                                              |
| wsrep_ist_receive_status         |                                                                                                                                                |
| wsrep_ist_receive_seqno_start    | 0                                                                                                                                              |
| wsrep_ist_receive_seqno_current  | 0                                                                                                                                              |
| wsrep_ist_receive_seqno_end      | 0                                                                                                                                              |
| wsrep_incoming_addresses         | 192.168.7.24:3306,192.168.7.10:3306,192.168.7.21:3306                                                                                          |
| wsrep_cluster_weight             | 3                                                                                                                                              |
| wsrep_desync_count               | 0                                                                                                                                              |
| wsrep_evs_delayed                |                                                                                                                                                |
| wsrep_evs_evict_list             |                                                                                                                                                |
| wsrep_evs_repl_latency           | 0/0/0/0/0                                                                                                                                      |
| wsrep_evs_state                  | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid                 | 025de9a0-1fba-11ed-88db-c363df565c63                                                                                                           |
| wsrep_gmcast_segment             | 0                                                                                                                                              |
| wsrep_cluster_capabilities       |                                                                                                                                                |
| wsrep_cluster_conf_id            | 7                                                                                                                                              |
| wsrep_cluster_size               | 3                                                                                                                                              |
| wsrep_cluster_state_uuid         | 5e1d7698-1fb9-11ed-94e3-a3cf0985bb6d                                                                                                           |
| wsrep_cluster_status             | Primary                                                                                                                                        |
| wsrep_connected                  | ON                                                                                                                                             |
| wsrep_local_bf_aborts            | 0                                                                                                                                              |
| wsrep_local_index                | 0                                                                                                                                              |
| wsrep_provider_capabilities      | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name              | Galera                                                                                                                                         |
| wsrep_provider_vendor            | Codership Oy <info@codership.com> (modified by Percona <https://percona.com/>)                                                                 |
| wsrep_provider_version           | 4.11(a9008fc)                                                                                                                                  |
| wsrep_ready                      | ON                                                                                                                                             |
| wsrep_thread_count               | 33                                                                                                                                             |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
> show status like 'wsrep_last%'; #from all cluster nodes

node1

+----------------------+--------+
| Variable_name        | Value  |
+----------------------+--------+
| wsrep_last_applied   | 788814 |
| wsrep_last_committed | 788814 |
+----------------------+--------+

node2

+----------------------+--------+
| Variable_name        | Value  |
+----------------------+--------+
| wsrep_last_applied   | 788833 |
| wsrep_last_committed | 788833 |
+----------------------+--------+

node3

+----------------------+--------+
| Variable_name        | Value  |
+----------------------+--------+
| wsrep_last_applied   | 788833 |
| wsrep_last_committed | 788833 |
+----------------------+--------+

On node1:

> SELECT count(*),LOCK_TYPE FROM performance_schema.metadata_locks group by LOCK_TYPE;
+----------+---------------------+
| count(*) | LOCK_TYPE           |
+----------+---------------------+
|     1302 | SHARED_READ         |
|     2690 | INTENTION_EXCLUSIVE |
|     2690 | SHARED_WRITE        |
+----------+---------------------+

In error.log on node1 only Too many connections
When node3 were joining:

2022-08-19T17:30:23.956369+03:00 0 [Note] [MY-000000] [Galera] gcomm/src/gmcast.cpp:handle_established():671: (025de9a0-88db, 'tcp://192.168.7.24:4567') connection established to 7628622f-883e tcp://192.168.7.10:4567
2022-08-19T17:30:23.956566+03:00 0 [Note] [MY-000000] [Galera] gcomm/src/gmcast.cpp:check_liveness():1258: (025de9a0-88db, 'tcp://192.168.7.24:4567') turning message relay requesting on, nonlive peers: 
2022-08-19T17:30:24.457181+03:00 0 [Note] [MY-000000] [Galera] gcomm/src/gmcast.cpp:handle_stable_view():1763: declaring 7628622f-883e at tcp://192.168.7.10:4567 stable
2022-08-19T17:30:24.457222+03:00 0 [Note] [MY-000000] [Galera] gcomm/src/gmcast.cpp:handle_stable_view():1763: declaring f3da9638-ac79 at tcp://192.168.7.21:4567 stable
2022-08-19T17:30:24.457889+03:00 0 [Note] [MY-000000] [Galera] gcomm/src/pc_proto.cpp:is_prim():864: Node 025de9a0-88db state primary
2022-08-19T17:30:24.458483+03:00 0 [Note] [MY-000000] [Galera] gcomm/src/pc_proto.cpp:deliver_view():257: Current view of cluster as seen by this node
view (view_id(PRIM,025de9a0-88db,7)
memb {
        025de9a0-88db,0
        7628622f-883e,0
        f3da9638-ac79,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2022-08-19T17:30:24.458520+03:00 0 [Note] [MY-000000] [Galera] gcomm/src/pc.cpp:handle_up():30: Save the discovered primary-component to disk
2022-08-19T17:30:24.460268+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_comp_msg():549: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2022-08-19T17:30:24.460408+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_core.cpp:core_handle_comp_msg():844: STATE_EXCHANGE: sent state UUID: 7675bd9c-1fcb-11ed-8901-c36f59fbaa57
2022-08-19T17:30:24.461102+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_core.cpp:core_handle_uuid_msg():942: STATE EXCHANGE: sent state msg: 7675bd9c-1fcb-11ed-8901-c36f59fbaa57
2022-08-19T17:30:24.461622+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_state_msg():740: STATE EXCHANGE: got state msg: 7675bd9c-1fcb-11ed-8901-c36f59fbaa57 from 0 (frxde-votes-mysql01)
2022-08-19T17:30:24.461663+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_state_msg():740: STATE EXCHANGE: got state msg: 7675bd9c-1fcb-11ed-8901-c36f59fbaa57 from 2 (frxde-votes-mysql02)
2022-08-19T17:30:24.957422+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_state_msg():740: STATE EXCHANGE: got state msg: 7675bd9c-1fcb-11ed-8901-c36f59fbaa57 from 1 (frxde-votes-mysql03)
2022-08-19T17:30:24.957481+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:group_post_state_exchange():486: Quorum results:
        version    = 6,
        component  = PRIMARY,
        conf_id    = 6,
        members    = 2/3 (primary/total),
        act_id     = 788814,
        last_appl. = 788743,
        protocols  = 2/10/4 (gcs/repl/appl),
        vote policy= 0,
        group UUID = 5e1d7698-1fb9-11ed-94e3-a3cf0985bb6d
2022-08-19T17:30:24.957527+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs.cpp:_set_fc_limits():971: Flow-control interval: [222, 277]
2022-08-19T17:30:24.957693+03:00 29 [Note] [MY-000000] [Galera] galera/src/replicator_smm.cpp:process_prim_conf_change():3147: ####### processing CC 788815, local, ordered
2022-08-19T17:30:24.957713+03:00 29 [Note] [MY-000000] [Galera] galera/src/replicator_smm.cpp:drain_monitors_for_local_conf_change():2651: Maybe drain monitors from 788814 upto current CC event 788815 upto:788814
2022-08-19T17:30:24.957722+03:00 29 [Note] [MY-000000] [Galera] galera/src/replicator_smm.cpp:drain_monitors_for_local_conf_change():2657: Drain monitors from 788814 up to 788814
2022-08-19T17:30:24.957733+03:00 29 [Note] [MY-000000] [Galera] galera/src/replicator_smm.cpp:process_prim_conf_change():3165: ####### My UUID: 025de9a0-1fba-11ed-88db-c363df565c63
2022-08-19T17:30:24.957741+03:00 29 [Note] [MY-000000] [Galera] galera/src/replicator_smm.cpp:reset_index_if_needed():2967: Skipping cert index reset
2022-08-19T17:30:24.957750+03:00 29 [Note] [MY-000000] [Galera] galera/src/replicator_smm.cpp:establish_protocol_versions():2494: REPL Protocols: 10 (5)
2022-08-19T17:30:24.957758+03:00 29 [Note] [MY-000000] [Galera] galera/src/certification.cpp:adjust_position():1109: ####### Adjusting cert position: 788814 -> 788815
2022-08-19T17:30:24.957841+03:00 0 [Note] [MY-000000] [Galera] galera/src/galera_service_thd.cpp:thd_func():49: Service thread queue flushed.
2022-08-19T17:30:24.959660+03:00 29 [Note] [MY-000000] [Galera] ================================================
View:
  id: 5e1d7698-1fb9-11ed-94e3-a3cf0985bb6d:788815
  status: primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 0
  members(3):
        0: 025de9a0-1fba-11ed-88db-c363df565c63, frxde-votes-mysql01
        1: 7628622f-1fcb-11ed-883e-762ce4adfa5d, frxde-votes-mysql03
        2: f3da9638-1fc5-11ed-ac79-efb20f047c01, frxde-votes-mysql02
=================================================
2022-08-19T17:30:24.959737+03:00 29 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2022-08-19T17:30:25.643275+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:group_select_donor():1822: Member 1.0 (frxde-votes-mysql03) requested state transfer from 'frxde-votes-mysql02,'. Selected 2.0 (frxde-votes-mysql02)(SYNCED) as donor.
2022-08-19T17:30:27.197304+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1234: 2.0 (frxde-votes-mysql02): State transfer to 1.0 (frxde-votes-mysql03) complete.
2022-08-19T17:30:27.197909+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_sync_msg():1287: Member 2.0 (frxde-votes-mysql02) synced with group.
2022-08-19T17:30:27.305708+03:00 0 [Note] [MY-000000] [Galera] gcomm/src/gmcast.cpp:check_liveness():1299: (025de9a0-88db, 'tcp://192.168.7.24:4567') turning message relay requesting off
2022-08-19T17:30:38.233641+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1234: 1.0 (frxde-votes-mysql03): State transfer from 2.0 (frxde-votes-mysql02) complete.
2022-08-19T17:30:38.234187+03:00 0 [Note] [MY-000000] [Galera] gcs/src/gcs_group.cpp:gcs_group_handle_sync_msg():1287: Member 1.0 (frxde-votes-mysql03) synced with group.

After join:

2022-08-19T17:30:38.414375+03:00 94383 [Warning] [MY-000000] [Server] Too many connections
2022-08-19T17:30:38.419851+03:00 94384 [Warning] [MY-000000] [Server] Too many connections

DMLs are IODKU: insert into comment_votes (story_id,comment_id,user_id,time,vote,is_disabled) values (?,?,?,...) on duplicate key update time = ?,vote = ?,is_disabled = ?
and
insert into story_votes (story_id,user_id,ip,time,vote,is_disabled,is_karma_update) values (?,?,?,...) on duplicate key update ip = ?,time = ?,vote = ?,is_disabled = ?,is_karma_update = ?, tables with partitions

I’ve tried to change table schema from:

Create Table: CREATE TABLE `story_votes` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `story_id` int unsigned NOT NULL,
  `user_id` int unsigned NOT NULL,
  `ip` varchar(255) NOT NULL,
  `time` int unsigned NOT NULL,
  `vote` tinyint NOT NULL,
  `is_disabled` tinyint NOT NULL,
  `is_karma_update` tinyint NOT NULL,
  PRIMARY KEY (`id`,`story_id`),
  UNIQUE KEY `story_user_uidx` (`story_id`,`user_id`),
  KEY `story_votes_user_idx` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2629721697 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (`story_id`)

to

CREATE TABLE `story_votes` ( 
 `story_id` int unsigned NOT NULL, 
 `user_id` int unsigned NOT NULL, 
 `ip` varchar(255) NOT NULL, 
 `time` int unsigned NOT NULL, 
 `vote` tinyint NOT NULL, 
 `is_disabled` tinyint NOT NULL, 
 `is_karma_update` tinyint NOT NULL, 
 PRIMARY KEY (`story_id`,`user_id`), 
 KEY `story_votes_user_idx` (`user_id`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 
/*!50100 PARTITION BY RANGE (`story_id`)

But schema changes didn’t help (((

I described situation without proxysql for simpler understanding. With proxysql I used single_writer configuration:

> select * from runtime_mysql_galera_hostgroups\G
*************************** 1. row ***************************
       writer_hostgroup: 101
backup_writer_hostgroup: 102
       reader_hostgroup: 103
      offline_hostgroup: 666
                 active: 1
            max_writers: 1
  writer_is_also_reader: 1
max_transactions_behind: 100
                comment: frxde-votes-cluster

And problem the same((( ProxySQL didn’t detect problem on node1 and tried to forward trafic on problem node.
After systemctl reload mysql in error.log many MDL conflicts

2022-08-22T10:51:05.168838+03:00 1364675 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=10 solved by abort
2022-08-22T10:51:05.168844+03:00 1364675 [Note] [MY-000000] [WSREP] MDL conflict db= table= ticket=10 solved by abort

/etc/mysql/mysql.conf.d/mysqld.cnf

[mysqld]
memlock
server_id       = 7024
binlog_format   = ROW
read_only       = OFF
super_read_only = OFF

gtid_mode       = ON
enforce_gtid_consistency        = ON

pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
log-error       = /var/log/mysql/error.log

skip-name-resolve
skip_replica_start      = ON
skip-host-cache
skip-external-locking
skip-networking       = 0

log_timestamps  = SYSTEM
skip-external-locking
skip-name-resolve=1

innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT 

max_connections = 1000
wait_timeout    = 40
connect_timeout = 10
concurrent_insert       =2
tmp_table_size  = 256M
max_heap_table_size     = 256M
innodb_buffer_pool_size = 80G
##
innodb_log_buffer_size=512M
innodb_log_file_size=1G
innodb_log_files_in_group=20
innodb_file_per_table
innodb_buffer_pool_instances=64
innodb_thread_concurrency = 16
thread_pool_size=36
innodb_io_capacity=3000
innodb_io_capacity_max=4000
sql-mode = NO_ENGINE_SUBSTITUTION 
innodb_file_per_table 
sort_buffer_size=256M
read_rnd_buffer_size=256M
log-bin=mysql-bin
log_replica_updates=ON
binlog_expire_logs_seconds=86400 
replica_load_tmpdir = /var/tmp
tmpdir = /var/tmp

# logging
slow_query_log                    = ON
innodb_print_all_deadlocks        = ON
slow_query_log_file               = /var/log/mysql/slow.log
log_output                        = FILE
long_query_time                   = 0
log_slow_admin_statements         = ON
log_slow_replica_statements       = ON
log_slow_rate_limit               = 100
log_slow_rate_type                = 'query'
slow_query_log_always_write_time  = 0.1
log_slow_verbosity                = 'full'
slow_query_log_use_global_control = 'all'

#Wsrep
pxc_encrypt_cluster_traffic=OFF
wsrep_provider = "/usr/lib/galera4/libgalera_smm.so"
innodb_autoinc_lock_mode=2
wsrep_cluster_address = "gcomm://192.168.7.21:4567,192.168.7.10:4567,192.168.7.24:4567"
wsrep_cluster_name = "pikabu-votes-cluster"
wsrep_node_name = "frxde-votes-mysql01"
wsrep_node_address = "192.168.7.24"
wsrep_sst_method = xtrabackup-v2
pxc_strict_mode = ENFORCING
wsrep_applier_threads = 32
wsrep_applier_UK_checks = ON
wsrep_log_conflicts = ON
wsrep_sst_receive_address = "192.168.7.24:4444"
wsrep_sst_donor=frxde-votes-mysql03,
wsrep_max_ws_size = 2147483647
wsrep_retry_autocommit = 1000
wsrep_provider_options = "gmcast.listen_addr=tcp://192.168.7.24:4567; ist.recv_addr=192.168.7.24:4568; gcache.size=30G; pc.checksum=true; evs.version=1; evs.auto_evict=0; gcs.fc_limit=160; gcs.fc_factor=0.8; gcs.max_packet_size=64500; cert.log_conflicts=YES; cert.optimistic_pa=YES; debug=yes;"

[sst]
rlimit = 80m
compressor = "pigz -3"
decompressor = "pigz -dc"
backup_threads = 4

Please help me solve the problem

1 Like