January 17, 2022 - Find release notes at Percona XtraDB Cluster 8.0.26-16.1
1 Like
@Aleksandra_Abramova
Can you clarify as I have had issues attempting to track this issue…
is
https://jira.percona.com/browse/PXC-3706
the same as
percona:8.0
← satya-bodapati:PXC-8.0-3776
opened 10:58AM - 01 Dec 21 UTC
…sslist
PXC-3724 : PXC crashes with long semaphore wait
https://jira.perco… na.com/browse/PXC-3776
https://jira.percona.com/browse/PXC-3724
Problem:
Under heavy R/W parallel workload, after some time all client sessions
are blocked and the server eventually fails with
[FATAL] Semaphore wait has lasted > 600 seconds.
Cause:
Client session 1 (TH1): Execute COMMIT (transaction 1).
During prepare phase (before certification) it is BF aborted by HP
thread. was_chosen_as_wsrep_victim flag is set to true.
It does not get into the commit phase and ha_rollback_trans() is called,
but was_chosen_as_wsrep_victim flag is not cleaned up.
Transaction is finished.
Another scenario: The transaction in TH1 is waiting to acquire table MDL.
This waiting trx is chosen as victim and BF Aborted by High Priority
transaction. In such case, we set was_chosen_as_wsrep_victim to true.
Since the transaction is not started by InnoDB, Server will neither
call commit nor rollback. The flag is left over and the future transaction
in the same session will 'carry' this flag.
Client session 1 (TH1): Execute COMMIT (transaction 2).
It has was_chosen_as_wsrep_victim==true (as not cleaned up before).
It certifies and then enters commit phase. It adds itself to
the wsrep_group_commit_queue and to the mysql group commit queue.
Other threads are added to simultaneously these queues as well
(before and after TH1)
Leader starts processing mysql group commit queue. It gets to TH1.
Transaction is committed, but as was_chosen_as_wsrep_victim==true
we skip the call to trx_sys_update_wsrep_checkpoint() (and internal
call to wsrep_unregister_from_group_commit() for this thread).
So TH1 is still at the beginning of wsrep_group_commit_queue
Leader goes to the next thread in mysql group commit queue.
It calls trx_sys_update_wsrep_checkpoint() which internally calls
wsrep_wait_for_turn_in_group_commit(). This blocks, as TH1 is still
in front of wsrep_group_commit_queue and the processing thread is the
2nd one in the wsrep_group_commit_queue.
The result is that Leader thread waits forever blocking all client
threads
Solution:
1. Clean was_chosen_as_wsrep_victim flag when the transaction is being
rolled back.
2. For the scenario, the victim is chosen before table opening or
before transaction start, we set this victim flag only if transaction
is started
MTR test not provided as the scenario needs multiple client
sessions and applier thread executing group commit logic simultaneously
which is problematic to model in MTR test.
Which got merged to several other items and all “appear” to be closed…
It reads to me like it is…
1 Like
Hello, meyerder!
I will ask engineers to look into this. I suggest you can try posting this directly to the issue or leave a comment on GitHub to get a faster response.
1 Like
Hi,
Both problems had the same symptoms (server hangs) and were related to the same part of code, so looking from this angle they are “the same”.
But PXC-3706 and PXC-3776 had different root causes, so considering this, they are not “the same”.
1 Like