Not the answer you need?
Register and ask your own question!

Percona xtradb cluster crash: HA_ERR_ROW_IS_REFERENCED

jjostjjost EntrantCurrent User Role Beginner
Hi,

We migrated a series of 100G Databases into different Galera Instances.Its running fine so far. I do have 5-6 years experience with Galera.
The cluster has 5 Nodes and the Loadbalancers always targets 1 Node for Read/Write and only takes the second(third...) when the primary node is down (or reports wsrep_ready off).

Version is the same on all servers:
Server version: 5.6.37-82.2-56-log Percona XtraDB Cluster (GPL), Release rel82.2, Revision 114f2f2, WSREP version 26.21, wsrep_26.21


In the night, always roughly at the same time, all 4 non-active nodes drop-out of the cluster at the same time with this error:
2018-01-29 01:26:56 33304 [ERROR] Slave SQL: Could not execute Delete_rows event on table storedb.object; Cannot delete or update a parent row: a foreign key constraint fails (`storedb`.`object`, CONSTRAINT `fk_object_object` FOREIGN KEY (`parentid`) REFERENCES `object` (`objectid`)), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 250, Error_code: 1451
2018-01-29 01:26:56 33304 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 3097067
2018-01-29 01:26:56 33304 [Warning] WSREP: Failed to apply app buffer: seqno: 3097067, status: 1


The table is looking like this:
| object | CREATE TABLE `object` (
`objectid` int(11) NOT NULL AUTO_INCREMENT,
`classid` int(11) NOT NULL,
`parentid` int(11) DEFAULT NULL,
`alias` varchar(255) COLLATE utf8_bin NOT NULL,
`siteid` int(11) NOT NULL,
PRIMARY KEY (`objectid`),
UNIQUE KEY `u_object_guid` (`guid`),
UNIQUE KEY `u_object_alias` (`parentid`,`alias`),
KEY `i_object_class` (`classid`),
KEY `i_object_siteid_classid_alias` (`siteid`,`classid`,`alias`),
CONSTRAINT `fk_object_object` FOREIGN KEY (`parentid`) REFERENCES `object` (`objectid`),
CONSTRAINT `fk_object_object_1` FOREIGN KEY (`siteid`) REFERENCES `object` (`objectid`)
) ENGINE=InnoDB AUTO_INCREMENT=<verylargenumber> DEFAULT CHARSET=utf8 COLLATE=utf8_bin |


The Query that is generating the error is *really* easy:
DELETE FROM `storedb`.`object` WHERE objectid='highnumber';


I saved the datadir from the moment before the cluster crashed (just kept one node out of the cluster after the crash and started it without wsrep provider to have a state to play with).
I can issue the query above with no problem. It deletes the row.

A corresponding GRA.log file is generated but mysqbinlog complains that it cant read the event:
ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 105, event_type: 32


I am running weekly consistency checks (percona pt-table-checksum) and they report no inconsistency.

Any idea what is happening here?

Greetings
julian

Comments

  • jjostjjost Entrant Current User Role Beginner
    Hey,

    as an update: It is NOT happening when wsrep_slave_threads are set to 1.
    But this is not a solution as it slows down the Cluster.
  • krunalbauskarkrunalbauskar Advisor Inactive User Role Novice
    Hi,

    Do you have reproducible test-case? I tried looking at the output but don't foresee any immediate clue.
  • jjostjjost Entrant Current User Role Beginner
    Hi,

    I am still working on that. It seems quite hard to trigger (with my limited skills in programming). With bash, mysql binary and mysqlimport, even in parrallel i can't trigger it. I *do* have code inside a big application i can run and it crashes in a few seconds but that is nothing i can provide so far.

    Greetings
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.