Data Inconsistency Between MySQL Master and Read Replica Post Master Failover in Mixed Format Replication

MySQL Version: Percona 5.7.39-42

Issue Description:

We are managing a MySQL cluster with one Master, one HSB (Hot Standby), and multiple read replicas using GTID replication. Recently, we encountered a hardware failure causing the master virtual machine to go down. As a result, our auto-failover process kicked in, promoting the HSB to master. During this process:

  • Some replicas were able to reconnect to the new master based on GTID alignment.
  • Other replicas had to be rebuilt due to GTID inconsistencies.

At the time of failover, all replicas (including HSB) had more than 10 minutes of replication lag. Our auto-failover process, however, waits until the HSB lag is synced before triggering the failover.

Problem:
After the failover, some replicas that reconnected to the newly promoted master (HSB) have exhibited data inconsistencies. Specifically, for the same set of values, data is different between the newly promoted master and the replicas. We are unable to identify the cause of this inconsistency.

Details on Data Inconsistency:

We parsed the binary logs of both the newly promoted master and the affected replica:

  1. Master binary logs: The data appears correct.
  2. Replica binary logs: Even though the logs suggest that the data should match the master’s, the replica shows inconsistent data.

Additional validations showed no GTID inconsistencies or errant GTIDs between the master and replica, and there are no direct inserts into the replica (read-only is enabled).

Moreover, the table in question only involves insert transactional DMLs (i.e., no updates or deletes).

Sample Table Structure:

```CREATE TABLE analytics_data (
start_time bigint(20) NOT NULL,
entity_id varchar(30) NOT NULL,
metric_1 int(11) NOT NULL DEFAULT ‘0’,
metric_2 int(11) NOT NULL DEFAULT ‘0’,
spend_amount decimal(10,2) DEFAULT ‘0.00’,
region varchar(30) NOT NULL,
type varchar(30) NOT NULL,
inserted_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
modified_at datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY index_1 (entity_id,start_time,region),
KEY index_2 (entity_id,start_time),
KEY index_3 (modified_at,type),
KEY index_4 (start_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


**Insert Query:**
```INSERT INTO analytics_data (start_time, entity_id, metric_1, metric_2, spend_amount, region, type) 
VALUES (...) 
ON DUPLICATE KEY UPDATE 
metric_1 = metric_1 + VALUES(metric_1), 
metric_2 = metric_2 + VALUES(metric_2), 
spend_amount = spend_amount + VALUES(spend_amount);

Sample Data Difference Between Master and Replica:

+-------------+-----------+---------+-----------+-----------+--------+-------+---------------------+---------------------+
| start_time  | entity_id | metric_1| metric_2  | spend_amount | region | type  | inserted_at          | modified_at          |
+-------------+-----------+---------+-----------+-----------+--------+-------+---------------------+---------------------+
| 202409301830| xxxxxxxxx | 237294  | 4353      | 27992.68  | DC1    | BRAND_xxx | 2024-09-30 21:23:20 | 2024-09-30 23:38:38 |
| 202409301830| xxxxxxxxx | 21704   | 1019      | 7581.40   | DC2    | BRAND_xxx | 2024-09-30 21:23:23 | 2024-09-30 23:38:40 |
| 202409301830| xxxxxxxxx | 230272  | 3975      | 25546.36  | DC3    | BRAND_xxx | 2024-09-30 21:23:21 | 2024-09-30 23:38:39 |
+-------------+-----------+---------+-----------+-----------+--------+-------+---------------------+---------------------+


replica(with data inconsistency)
+-------------+-----------+---------+-----------+-----------+--------+-------+---------------------+---------------------+
| start_time  | entity_id | metric_1| metric_2  | spend_amount | region | type  | inserted_at          | modified_at          |
+-------------+-----------+---------+-----------+-----------+--------+-------+---------------------+---------------------+
| 202409301830| xxxxxxxxx | 451848  | 8534      | 54924.84  | DC1    | BRAND_xxx | 2024-09-30 21:23:20 | 2024-09-30 23:38:38 |
| 202409301830| xxxxxxxxx | 42153   | 2018      | 15016.56  | DC2    | BRAND_xxx | 2024-09-30 21:23:23 | 2024-09-30 23:38:40 |
| 202409301830| xxxxxxxxx | 438160  | 7799      | 50167.20  | DC3    | BRAND_xxx | 2024-09-30 21:23:21 | 2024-09-30 23:38:39 |
+-------------+-----------+---------+-----------+-----------+--------+-------+---------------------+---------------------+

We can see that for the same entity_id, with the same inserted_at and modified_at, the values for metric_1, metric_2, and spend_amount are different in the replica compared to the master which is strange.

Suspected Causes & Investigation:

We reviewed the following scenarios that could lead to data inconsistencies in mixed replication format:

  1. Half-applied transactions (when a transaction updates non-transactional tables and applies some but not all changes):
  • We verified that all tables are transactional (InnoDB).
  1. Gaps in replication (common with multithreaded replicas):
  1. INSERT … ON DUPLICATE KEY UPDATE with multiple primary/unique keys:

But below one we suspect might be related to our usecase:
Link: https://dev.mysql.com/doc/refman/5.7/en/replication-features-transaction-inconsistencies.html
6.If the last transaction in the relay log is only half-received and the multithreaded replica coordinator has started to schedule the transaction to a worker, then STOP SLAVE waits up to 60 seconds for the transaction to be received. After this timeout, the coordinator gives up and aborts the transaction. If the transaction is mixed, it may be left half-completed.

We tried to reproduce the scenario in our stage env, but was unable to.

Next Steps:

We would appreciate your insights into the following:

  • Possible causes of the data inconsistency between master and replica.
  • Any additional checks we can perform to narrow down the cause.
  • Steps to avoid such inconsistencies in the future. (As per percona and other refered articles we have planned to moved to ROW format to avoid this in future)

Let me know if you need any further details.

Hi @pravata_dash,

This is the number #1 reason not to use MIXED. MIXED defaults to STATEMENT and STATEMENT offers no guarantee of data consistency. You need to use ROW (especially since STATEMENT is now deprecated and will be removed soon). ROW contains both the ‘before image’ of the row, and the ‘after image’. This creates a consistency check on replicas. If the row on the replica does not match the ‘before image’, replication will break.

Use pt-table-checksum on a regular basis; (ie: daily or weekly). This will alert you to any data inconsistencies between source and replicas.

Thanks for the details @matthewb.

We will ensure that everything remains ROW only unless there is a genuine need for others. We used pt-table-checksum for the sanity check and will explore automation options for verifying such cases.

Remember, that MIXED is just an alias for STATEMENT in 99% of cases. Also remember that MIXED and STATEMENT will be removed soon, so you won’t be able to use them with 8.4+

Yep, use this weekly and use pt-table-sync to fix data issues. pt-table-sync can read the result from pt-table-checksum and fix only what is discovered to be inconsistent.

1 Like