Cluster node stuck with long semaphore wait in buf0buf.cc line 789

Oleksandr_Bezpiatov · February 21, 2022, 4:57pm

Hi, today we’ve got another issue, on another cluster.

We have temporary architecture for migrating from old to new infrastructure configuration. So that we have MySQL Xtradb Cluster “replicating data from” another DataCenter.

Schema is like: another_dc_master -> [ pxc1.new, pxc2.new, pxc3.new ]
GTID disabled.

Replication is stuck on pxc1.new with symptoms:

constantly increasing replication lag, commits were applying visually due to Exec_Master_Log_Pos counter, but likely too slow
after few minutes mysql client was able to connect, but not able to execute any commands locally

Significant part of the log below:

2022-02-21T16:22:11.640748Z 0 [Warning] [MY-012985] [InnoDB] A long semaphore wait:
--Thread 139623264241408 has waited at trx0rseg.ic line 50 for 241 seconds the semaphore:
X-lock on RW-latch at 0x7ef1b43a2690 created in file buf0buf.cc line 789
a writer (thread id 139623264831232) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/percona-xtradb-cluster-8.0.23-14/storage/innobase/include/trx0rseg.ic line 50
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
2022-02-21 16:22:20 0x7eef3d97d700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 3 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2704167 srv_active, 0 srv_shutdown, 310 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 7175967
--Thread 139564786370304 has waited at trx0types.h line 193 for 178 seconds the semaphore:
Mutex at 0x7efca9c1e378, Mutex UNDO_SPACE_RSEG created trx0rseg.cc:235, lock var 1

Log below consisted of repeatedly logging Innodb Engine Status output.

Versions:

mysql  Ver 8.0.23-14.1 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel14, Revision d3b9a1d, WSREP version 26.4.3)

What could it be?

Thank you.

Oleksandr_Bezpiatov · February 21, 2022, 5:15pm

Additionally I can say another subjective observations:

We have several clusters with similar configuration, but observing such problem only in specific case. This problem is only seen on the cluster, that has pxc1 node working as a REPLICA for some external master mysql source and having more than 1 node in cluster (usually 3).

This problem is not observed after resetting slaves on pxc1, and it is not observed when pxc1 running in REPLICA mode but is standalone (only 1 active node in cluster).

Sebastian_Paul · August 24, 2022, 8:55am

Hi Oleksandr_Bezpiatov,

we have pretty much the same problem I think. From time to time one of the nodes crashes. Last time just yesterday.

2022-08-23T18:14:15.304603+02:00 0 [Warning] [MY-012985] [InnoDB] A long semaphore wait:
--Thread 139879230437120 has waited at trx0rseg.ic line 50 for 240 seconds the semaphore:
X-lock on RW-latch at 0x7f274ddee3b0 created in file buf0buf.cc line 793
a writer (thread id 139885579880192) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/percona-xtradb-cluster-8.0.28-19/storage/innobase/include/trx0rseg.ic line 50
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
2022-08-23 18:14:17 139777349252864 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 0 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1656252 srv_active, 0 srv_shutdown, 229 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 482520550
--Thread 139879230437120 has waited at trx0rseg.ic line 50 for 242 seconds the semaphore:
X-lock on RW-latch at 0x7f274ddee3b0 created in file buf0buf.cc line 793
a writer (thread id 139885579880192) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/percona-xtradb-cluster-8.0.28-19/storage/innobase/include/trx0rseg.ic line 50
OS WAIT ARRAY INFO: reservation count 0
OS WAIT ARRAY INFO: reservation count 484854131

Have you been able to solve your issue? We already opened a Bug but unfortunately we’re not able to reproduce it clearly. ([PXC-3919] Cluster node completely locks in trx0undo.ic line 126 - Percona JIRA)

We don’t use a cluster node as a Replica, but as a source for several replica-nodes (MySQL community edition)

Topic		Replies	Views
Node stuck with long semaphore wait in trx0undo.ic trx0purge.cc Percona Server for MySQL 8.0	5	56	April 18, 2025
Cluster freeze and shutdown on ver 5.7.21-29.26.1 Percona XtraDB Cluster 5.x	1	902	January 29, 2019
Percona XtraDB Failure due to long Semaphore Percona XtraDB Cluster 5.x	0	823	April 5, 2019
Xtradb 5.5.34 cluster is hung with semaphores for 30 mins Percona XtraDB Cluster 5.x	1	687	June 2, 2014
Cluster Stalls/Bug update? Percona Distribution for MySQL closed-no-reply	0	555	December 6, 2021

Cluster node stuck with long semaphore wait in buf0buf.cc line 789

Related topics