Not the answer you need?
Register and ask your own question!

Occassional node failure (FK related)

theevilmuppettheevilmuppet EntrantCurrent User Role Beginner
Hi all,

We have a 3 node cluster set up, with one of the three nodes acting as a replication slave to a standard Percona Server. There is no other traffic on the server at this time.

Strangely, we're finding that every now and then one node (sometimes 2 nodes) will die due to node consistency being compromised . The transactions in question are being committed without problem on the replication master and the node in the cluster that is acting as replication slave.

We're currently running the latest version of Percona Cluster as installed from the Percona repository (Percona-XtraDB-Cluster-server-5.5.28-23.7.369.rhel6.x86_64 and Percona-XtraDB-Cluster-galera-2.0-1.117.rhel6.x86_64) and have the following configuration applied on all three nodes (with appropriate address changes for various wsrep settings):


[mysqld_safe]wsrep_urls=gcomm://192.168.160.5:4567,gcomm://192.168.160.8:4567[mysqld]datadir=/var/lib/mysqluser=mysqlmax_connections=4000server-id=40binlog_format=ROWdefault_storage_engine=InnoDBlong_query_time=10log_error=/var/log/mysql/error_loglog_slow_queries=/var/log/mysql/slow_query_loglog_slow_verbosity=fullsync_binlog=0log_bin = bin-loglog_slave_updates = 1wsrep_node_address=192.168.160.4wsrep_sst_receive_address=192.168.160.4wsrep_sst_auth=root:742evergreentcewsrep_provider=/usr/lib64/libgalera_smm.sowsrep_slave_threads=2wsrep_sst_method=xtrabackupwsrep_cluster_name=percona_clusterwsrep_node_name=node1wsrep_replicate_myisam=1innodb_buffer_pool_instances=1innodb_data_home_dir=/data/mysqlinnodb_log_group_home_dir=/data/mysqlinnodb_data_file_path=ibdata1:1G:autoextendinnodb_buffer_pool_size=96Ginnodb_flush_method=ALL_O_DIRECTinnodb_additional_mem_pool_size=256Minnodb_log_buffer_size=256Minnodb_log_file_size=256Minnodb_flush_log_at_trx_commit=0innodb_read_io_threads=8innodb_autoinc_lock_mode=2innodb_locks_unsafe_for_binlog=1</pre>


When the error does occur, the following error occurs (always involving the same tables):


121123 15:03:32 [ERROR] Slave SQL: Could not execute Write_rows event on table nablive.t_schedule_payment; Cannot add or update a child row: a foreign key constraint fails (`ntproddb`.`t_schedule_payment`, CONSTRAINT `t_schedule_payment_ibfk_1` FOREIGN KEY (`sched_id`) REFERENCES `t_schedule` (`id`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 232, Error_code: 1452121123 15:03:32 [Warning] WSREP: RBR event 2 Write_rows apply warning: 151, 1663670121123 15:03:32 [ERROR] WSREP: Failed to apply trx: source: eed70aa6-346b-11e2-0800-24241b746e7d version: 2 local: 0 state: APPLYING flags: 1 conn_id: 4 trx_id: 676177914 seqnos (l: 59090, g: 1663670, s: 1663669, d: 1663365, ts: 1353643412747819106)121123 15:03:32 [ERROR] WSREP: Failed to apply app buffer: ���P�, seqno: 1663670, status: WSREP_FATAL at galera/src/replicator_smm.cpp:apply_wscoll():49 at galera/src/replicator_smm.cpp:apply_trx_ws():120121123 15:03:32 [ERROR] WSREP: Node consistency compromized, aborting...121123 15:03:32 [Note] WSREP: Closing send monitor...121123 15:03:32 [Note] WSREP: Closed send monitor.121123 15:03:32 [Note] WSREP: gcomm: terminating thread121123 15:03:32 [Note] WSREP: gcomm: joining thread121123 15:03:32 [Note] WSREP: gcomm: closing backend121123 15:03:32 [Note] WSREP: view(view_id(NON_PRIM,05aece68-3511-11e2-0800-5e30f373f04f,9) memb { 05aece68-3511-11e2-0800-5e30f373f04f,} joined {} left {} partitioned { eed70aa6-346b-11e2-0800-24241b746e7d, fbc2a20c-3503-11e2-0800-2365c6b05d93,})121123 15:03:32 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1121123 15:03:32 [Note] WSREP: view((empty))121123 15:03:32 [Note] WSREP: gcomm: closed121123 15:03:32 [Note] WSREP: Flow-control interval: [16, 16]121123 15:03:32 [Note] WSREP: Received NON-PRIMARY.121123 15:03:32 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1663674)121123 15:03:32 [Note] WSREP: Received self-leave message.121123 15:03:32 [Note] WSREP: Flow-control interval: [0, 0]121123 15:03:32 [Note] WSREP: Received SELF-LEAVE. Closing connection.121123 15:03:32 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1663674)121123 15:03:32 [Note] WSREP: RECV thread exiting 0: Success121123 15:03:32 [Note] WSREP: recv_thread() joined.121123 15:03:32 [Note] WSREP: Closing slave action queue.121123 15:03:32 [Note] WSREP: /usr/sbin/mysqld: Terminated.121123 15:03:33 mysqld_safe Number of processes running now: 0121123 15:03:33 mysqld_safe WSREP: not restarting wsrep node automatically</pre>


The create table statements for the two tables in the FK relationship that are at the centre of this error are:


CREATE TABLE `t_schedule` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(50) NOT NULL, `version` int(11) NOT NULL, `client_ref` varchar(50) NOT NULL, `freq` int(11) NOT NULL, `start_dt` datetime NOT NULL, `to_dt` datetime DEFAULT NULL, `exec_cnt` int(11) DEFAULT NULL, `created_by` int(11) NOT NULL, `created_on` datetime NOT NULL, `payment_type` int(11) NOT NULL, `lastmod_by` int(11) DEFAULT NULL, `lastmod_on` datetime DEFAULT NULL, `merchant_id` varchar(6) NOT NULL, `type_id` smallint(6) NOT NULL DEFAULT '0', KEY `merchant_id_fk` (`merchant_id`), KEY `id` (`id`), KEY `fk_schedule_isof_type` (`type_id`), CONSTRAINT `merchant_id_fk` FOREIGN KEY (`merchant_id`) REFERENCES `t_merchants` (`Merchant_ID`), CONSTRAINT `t_schedule_ibfk_1` FOREIGN KEY (`type_id`) REFERENCES `t_schedule_type` (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=71507 DEFAULT CHARSET=latin1 CREATE TABLE `t_schedule_payment` ( `id` int(11) NOT NULL AUTO_INCREMENT, `version` int(11) NOT NULL, `processed` bit(1) NOT NULL, `cc_holder` varchar(50) DEFAULT NULL, `cc_exp_yy` varchar(2) DEFAULT NULL, `cc_exp_mm` varchar(2) DEFAULT NULL, `created_by` int(11) NOT NULL, `created_on` datetime NOT NULL, `cc_no` varchar(60) DEFAULT NULL, `transfer_dt` datetime NOT NULL, `lastmod_by` int(11) DEFAULT NULL, `lastmod_on` datetime DEFAULT NULL, `amt` int(11) DEFAULT NULL, `canceled` bit(1) NOT NULL, `pan` varchar(12) DEFAULT NULL, `sched_id` int(11) NOT NULL, `currency` char(3) NOT NULL DEFAULT 'AUD', `card_seq_no` char(3) DEFAULT NULL, `crn` varchar(20) DEFAULT NULL, `dd_bsb` varchar(20) DEFAULT NULL, `dd_acc_no` varchar(50) DEFAULT NULL, `dd_acc_name` varchar(255) DEFAULT NULL, `batch_id` int(11) DEFAULT NULL, `batch_trxid` int(11) DEFAULT NULL, KEY `id` (`id`), KEY `sched_id_fk` (`sched_id`), CONSTRAINT `t_schedule_payment_ibfk_1` FOREIGN KEY (`sched_id`) REFERENCES `t_schedule` (`id`) ON DELETE CASCADE) ENGINE=InnoDB AUTO_INCREMENT=3964335 DEFAULT CHARSET=latin1</pre>


To rule out data corruption / referential integrity issues, we've run the following query on all the available nodes and gotten the same results:


mysql> select distinct sched_id from t_schedule_payment where sched_id not in (select id from t_schedule);Empty set (2.47 sec)</pre>


I've seen a few mentions of FK strangeness in this forum recently but it appeared that various patches had solved this issue. Is there something obvious (or not obvious, or undocumented etc) that we're missing here?

If there's any further information we can provide, we definitely will.

Comments

  • flyersaflyersa Entrant Current User Role Beginner
    I have the same issues but with primary keys and duplicate keys that in theory should not even happen.
  • shockwavecsshockwavecs Contributor Inactive User Role Beginner
    anyone figure this out?
  • przemekprzemek Percona Support Engineer Percona Staff Role
    Foreign keys are pretty tricky it seems in this kind of clustering solution, plus parallel applying makes it even more complicated. However many FK-related bugs were fixed till now.
    This thread is pretty old, but Shockwavecs - do you experience anything similar?
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.