Hi all,
We have a 3 node cluster set up, with one of the three nodes acting as a replication slave to a standard Percona Server. There is no other traffic on the server at this time.
Strangely, we’re finding that every now and then one node (sometimes 2 nodes) will die due to node consistency being compromised . The transactions in question are being committed without problem on the replication master and the node in the cluster that is acting as replication slave.
We’re currently running the latest version of Percona Cluster as installed from the Percona repository (Percona-XtraDB-Cluster-server-5.5.28-23.7.369.rhel6.x86_64 and Percona-XtraDB-Cluster-galera-2.0-1.117.rhel6.x86_64) and have the following configuration applied on all three nodes (with appropriate address changes for various wsrep settings):
[mysqld_safe]wsrep_urls=gcomm://192.168.160.5:4567,gcomm://192.168.160.8:4567[mysqld]datadir=/var/lib/mysqluser=mysqlmax_connections=4000server-id=40binlog_format=ROWdefault_storage_engine=InnoDBlong_query_time=10log_error=/var/log/mysql/error_loglog_slow_queries=/var/log/mysql/slow_query_loglog_slow_verbosity=fullsync_binlog=0log_bin = bin-loglog_slave_updates = 1wsrep_node_address=192.168.160.4wsrep_sst_receive_address=192.168.160.4wsrep_sst_auth=root:742evergreentcewsrep_provider=/usr/lib64/libgalera_smm.sowsrep_slave_threads=2wsrep_sst_method=xtrabackupwsrep_cluster_name=percona_clusterwsrep_node_name=node1wsrep_replicate_myisam=1innodb_buffer_pool_instances=1innodb_data_home_dir=/data/mysqlinnodb_log_group_home_dir=/data/mysqlinnodb_data_file_path=ibdata1:1G:autoextendinnodb_buffer_pool_size=96Ginnodb_flush_method=ALL_O_DIRECTinnodb_additional_mem_pool_size=256Minnodb_log_buffer_size=256Minnodb_log_file_size=256Minnodb_flush_log_at_trx_commit=0innodb_read_io_threads=8innodb_autoinc_lock_mode=2innodb_locks_unsafe_for_binlog=1
When the error does occur, the following error occurs (always involving the same tables):
121123 15:03:32 [ERROR] Slave SQL: Could not execute Write_rows event on table nablive.t_schedule_payment; Cannot add or update a child row: a foreign key constraint fails (ntproddb
.t_schedule_payment
, CONSTRAINT t_schedule_payment_ibfk_1
FOREIGN KEY (sched_id
) REFERENCES t_schedule
(id
) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event’s master log FIRST, end_log_pos 232, Error_code: 1452121123 15:03:32 [Warning] WSREP: RBR event 2 Write_rows apply warning: 151, 1663670121123 15:03:32 [ERROR] WSREP: Failed to apply trx: source: eed70aa6-346b-11e2-0800-24241b746e7d version: 2 local: 0 state: APPLYING flags: 1 conn_id: 4 trx_id: 676177914 seqnos (l: 59090, g: 1663670, s: 1663669, d: 1663365, ts: 1353643412747819106)121123 15:03:32 [ERROR] WSREP: Failed to apply app buffer: ���P�, seqno: 1663670, status: WSREP_FATAL at galera/src/replicator_smm.cpp:apply_wscoll():49 at galera/src/replicator_smm.cpp:apply_trx_ws():120121123 15:03:32 [ERROR] WSREP: Node consistency compromized, aborting…121123 15:03:32 [Note] WSREP: Closing send monitor…121123 15:03:32 [Note] WSREP: Closed send monitor.121123 15:03:32 [Note] WSREP: gcomm: terminating thread121123 15:03:32 [Note] WSREP: gcomm: joining thread121123 15:03:32 [Note] WSREP: gcomm: closing backend121123 15:03:32 [Note] WSREP: view(view_id(NON_PRIM,05aece68-3511-11e2-0800-5e30f373f04f,9) memb { 05aece68-3511-11e2-0800-5e30f373f04f,} joined {} left {} partitioned { eed70aa6-346b-11e2-0800-24241b746e7d, fbc2a20c-3503-11e2-0800-2365c6b05d93,})121123 15:03:32 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1121123 15:03:32 [Note] WSREP: view((empty))121123 15:03:32 [Note] WSREP: gcomm: closed121123 15:03:32 [Note] WSREP: Flow-control interval: [16, 16]121123 15:03:32 [Note] WSREP: Received NON-PRIMARY.121123 15:03:32 [Note] WSREP: Shifting SYNCED → OPEN (TO: 1663674)121123 15:03:32 [Note] WSREP: Received self-leave message.121123 15:03:32 [Note] WSREP: Flow-control interval: [0, 0]121123 15:03:32 [Note] WSREP: Received SELF-LEAVE. Closing connection.121123 15:03:32 [Note] WSREP: Shifting OPEN → CLOSED (TO: 1663674)121123 15:03:32 [Note] WSREP: RECV thread exiting 0: Success121123 15:03:32 [Note] WSREP: recv_thread() joined.121123 15:03:32 [Note] WSREP: Closing slave action queue.121123 15:03:32 [Note] WSREP: /usr/sbin/mysqld: Terminated.121123 15:03:33 mysqld_safe Number of processes running now: 0121123 15:03:33 mysqld_safe WSREP: not restarting wsrep node automatically
The create table statements for the two tables in the FK relationship that are at the centre of this error are:
CREATE TABLE t_schedule
( id
int(11) NOT NULL AUTO_INCREMENT, name
varchar(50) NOT NULL, version
int(11) NOT NULL, client_ref
varchar(50) NOT NULL, freq
int(11) NOT NULL, start_dt
datetime NOT NULL, to_dt
datetime DEFAULT NULL, exec_cnt
int(11) DEFAULT NULL, created_by
int(11) NOT NULL, created_on
datetime NOT NULL, payment_type
int(11) NOT NULL, lastmod_by
int(11) DEFAULT NULL, lastmod_on
datetime DEFAULT NULL, merchant_id
varchar(6) NOT NULL, type_id
smallint(6) NOT NULL DEFAULT ‘0’, KEY merchant_id_fk
(merchant_id
), KEY id
(id
), KEY fk_schedule_isof_type
(type_id
), CONSTRAINT merchant_id_fk
FOREIGN KEY (merchant_id
) REFERENCES t_merchants
(Merchant_ID
), CONSTRAINT t_schedule_ibfk_1
FOREIGN KEY (type_id
) REFERENCES t_schedule_type
(id
) ) ENGINE=InnoDB AUTO_INCREMENT=71507 DEFAULT CHARSET=latin1 CREATE TABLE t_schedule_payment
( id
int(11) NOT NULL AUTO_INCREMENT, version
int(11) NOT NULL, processed
bit(1) NOT NULL, cc_holder
varchar(50) DEFAULT NULL, cc_exp_yy
varchar(2) DEFAULT NULL, cc_exp_mm
varchar(2) DEFAULT NULL, created_by
int(11) NOT NULL, created_on
datetime NOT NULL, cc_no
varchar(60) DEFAULT NULL, transfer_dt
datetime NOT NULL, lastmod_by
int(11) DEFAULT NULL, lastmod_on
datetime DEFAULT NULL, amt
int(11) DEFAULT NULL, canceled
bit(1) NOT NULL, pan
varchar(12) DEFAULT NULL, sched_id
int(11) NOT NULL, currency
char(3) NOT NULL DEFAULT ‘AUD’, card_seq_no
char(3) DEFAULT NULL, crn
varchar(20) DEFAULT NULL, dd_bsb
varchar(20) DEFAULT NULL, dd_acc_no
varchar(50) DEFAULT NULL, dd_acc_name
varchar(255) DEFAULT NULL, batch_id
int(11) DEFAULT NULL, batch_trxid
int(11) DEFAULT NULL, KEY id
(id
), KEY sched_id_fk
(sched_id
), CONSTRAINT t_schedule_payment_ibfk_1
FOREIGN KEY (sched_id
) REFERENCES t_schedule
(id
) ON DELETE CASCADE) ENGINE=InnoDB AUTO_INCREMENT=3964335 DEFAULT CHARSET=latin1
To rule out data corruption / referential integrity issues, we’ve run the following query on all the available nodes and gotten the same results:
mysql> select distinct sched_id from t_schedule_payment where sched_id not in (select id from t_schedule);Empty set (2.47 sec)
I’ve seen a few mentions of FK strangeness in this forum recently but it appeared that various patches had solved this issue. Is there something obvious (or not obvious, or undocumented etc) that we’re missing here?
If there’s any further information we can provide, we definitely will.