WSREP: cluster conflict due to certification failure

Hi!

I cannot understand why I’m getting errors “WSREP: cluster conflict due to certification failure”. Hope someone can help me with it because I’m literally at my wits’ end.

There are 4 nodes in two datacenters, 200 RPM at max for tbl1 and 250 RPM at max for tbl2 (seems not too big).

There is some system info:

mysql> SELECT @@version,@@version_comment;
+------------------+-------------------------------------------------------------------------------------------------+
| @@version        | @@version_comment                                                                               |
+------------------+-------------------------------------------------------------------------------------------------+
| 5.7.27-30-57-log | Percona XtraDB Cluster (GPL), Release rel30, Revision 64987d4, WSREP version 31.39, wsrep_31.39 |
+------------------+-------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SHOW STATUS LIKE 'wsrep_provider_version';
+------------------------+----------------+
| Variable_name          | Value          |
+------------------------+----------------+
| wsrep_provider_version | 3.39(rb3295e6) |
+------------------------+----------------+
1 row in set (0.00 sec)

mysql> SHOW VARIABLES LIKE '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 4     |
| auto_increment_offset        | 3     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

Log example:

2020-10-24T16:47:47.100389Z 2015474331 [Note] WSREP: --------- CONFLICT DETECTED --------
2020-10-24T16:47:47.100414Z 2015474331 [Note] WSREP: cluster conflict due to certification failure for threads:

2020-10-24T16:47:47.100421Z 2015474331 [Note] WSREP: Victim thread:
THD: 2015474331, mode: local, state: executing, conflict: cert failure, seqno: 200651974
SQL: INSERT INTO tbl1 (some_varchar_field, another_varchar_field, some_foreign_key_id, some_datetime_field, another_datetime_field) VALUES (some_varchar_value, another_varchar_value, some_foreign_key_id_value, some_datetime_value, another_datetime_value)

2020-10-24T16:47:47.100437Z 2034527141 [Note] WSREP: --------- CONFLICT DETECTED --------
2020-10-24T16:47:47.100446Z 2030700502 [Note] WSREP: --------- CONFLICT DETECTED --------
2020-10-24T16:47:47.100453Z 2034527141 [Note] WSREP: cluster conflict due to certification failure for threads:

2020-10-24T16:47:47.100461Z 2030700502 [Note] WSREP: cluster conflict due to certification failure for threads:

2020-10-24T16:47:47.100466Z 2034527141 [Note] WSREP: Victim thread:
THD: 2034527141, mode: local, state: executing, conflict: cert failure, seqno: 200651977
SQL: INSERT INTO tbl1 (some_varchar_field, another_varchar_field, some_foreign_key_id, some_datetime_field, another_datetime_field) VALUES (some_varchar_value, another_varchar_value, some_foreign_key_id_value, some_datetime_value, another_datetime_value)

2020-10-24T16:47:47.100470Z 2030700502 [Note] WSREP: Victim thread:
THD: 2030700502, mode: local, state: executing, conflict: cert failure, seqno: 200651978
SQL: INSERT INTO tbl2 (some_foreign_key_id, another_foreign_key_id, one_more_foreign_key_id, some_varchar_field) VALUES (some_foreign_key_value, another_foreign_key_value, one_more_foreign_key_value, some_varchar_value)

tbl1 and tbl2 are not connected in any way, they correspond to different databases.

What can lead to these errors? How can I identify the cause?

Hi,

What if you try using this variable on all nodes:

wsrep_certification_rules=‘OPTIMIZED’

That should help for such issues related to tables having FK constraints.