Hi,
We migrated a series of 100G Databases into different Galera Instances.Its running fine so far. I do have 5-6 years experience with Galera.
The cluster has 5 Nodes and the Loadbalancers always targets 1 Node for Read/Write and only takes the second(third…) when the primary node is down (or reports wsrep_ready off).
Version is the same on all servers:
Server version: 5.6.37-82.2-56-log Percona XtraDB Cluster (GPL), Release rel82.2, Revision 114f2f2, WSREP version 26.21, wsrep_26.21
In the night, always roughly at the same time, all 4 non-active nodes drop-out of the cluster at the same time with this error:
2018-01-29 01:26:56 33304 [ERROR] Slave SQL: Could not execute Delete_rows event on table storedb.object; Cannot delete or update a parent row: a foreign key constraint fails (storedb
.object
, CONSTRAINT fk_object_object
FOREIGN KEY (parentid
) REFERENCES object
(objectid
)), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event’s master log FIRST, end_log_pos 250, Error_code: 1451
2018-01-29 01:26:56 33304 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 3097067
2018-01-29 01:26:56 33304 [Warning] WSREP: Failed to apply app buffer: seqno: 3097067, status: 1
The table is looking like this:
| object | CREATE TABLE object
(
objectid
int(11) NOT NULL AUTO_INCREMENT,
classid
int(11) NOT NULL,
parentid
int(11) DEFAULT NULL,
alias
varchar(255) COLLATE utf8_bin NOT NULL,
siteid
int(11) NOT NULL,
PRIMARY KEY (objectid
),
UNIQUE KEY u_object_guid
(guid
),
UNIQUE KEY u_object_alias
(parentid
,alias
),
KEY i_object_class
(classid
),
KEY i_object_siteid_classid_alias
(siteid
,classid
,alias
),
CONSTRAINT fk_object_object
FOREIGN KEY (parentid
) REFERENCES object
(objectid
),
CONSTRAINT fk_object_object_1
FOREIGN KEY (siteid
) REFERENCES object
(objectid
)
) ENGINE=InnoDB AUTO_INCREMENT= DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
The Query that is generating the error is really easy:
DELETE FROM storedb
.object
WHERE objectid=‘highnumber’;
I saved the datadir from the moment before the cluster crashed (just kept one node out of the cluster after the crash and started it without wsrep provider to have a state to play with).
I can issue the query above with no problem. It deletes the row.
A corresponding GRA.log file is generated but mysqbinlog complains that it cant read the event:
ERROR: Error in Log_event::read_log_event(): ‘Found invalid event in binary log’, data_len: 105, event_type: 32
I am running weekly consistency checks (percona pt-table-checksum) and they report no inconsistency.
Any idea what is happening here?
Greetings
julian