I have a Percona XtraDB Cluster setup with 3 nodes at Ubuntu 20.04LTS
Server version: 8.0.22-13.1 Percona XtraDB Cluster (GPL), Release rel13, Revision a48e6d5, WSREP version 26.4.3
At node 1 I have a cron job launches update procedure every hour. Sometimes I can see that SQL UPDATE query has freezed in state “wsrep: replicating and certifying write set(-1)”
Full SQL statement: UPDATE goods SET params=CAST('{\"warranty_months\":24,\"type_case\":60,\"polarity\":57,\"type_cleat\":54,\"type_akb\":45,\"length\":242,\"width\":175,\"height\":190,\"current\":600,\"capacity\":60,\"tech\":48,\"start_stop\":null,\"promo\":51}' AS JSON),
updated_at='2021-05-21 12:24:49' WHERE
id=1713
The update procedure performs a large volume of such requests with similar data (we are talking about several hundreds) in one transaction. Each request has used a primary key (WHERE id
=1713). The goods
table not so huge - its have 1100 records only.
When I’ve got a freezed request in node 1, other nodes send in /var/log/mysql/error.log message:
2021-05-21T20:40:00.892654Z 0 [Warning] [MY-000000] [Galera] Failed to report last committed 796058d9-b8a2-11eb-9072-c25e8ba7694b:26442, -110 (Connection timed out)
If I force restart node 2 or node 3, the remaining node 1 successfully completes the hung request and successfully continues its work. The problem can be repeated both in the next cycle of the update procedure, and in a day or more.
Do I need to reduce the number of requests in one transaction?
I ask for recommendations to stabilize the cluster operation